//build/ session: JSON Document modeling

//build/ session: JSON Document modeling

Ryan CrawCour and I recorded a fun session for //build/ 2016:

Modeling Data for NoSQL Document Databases

Document databases are non-relational databases that store data as collections of JSON documents, such as:

   "id": "P468",
   "title": "Modeling Data for NoSQL Document Databases",
   "speakers": [
      {"name": "David Makogon"},
      {"name": "Ryan CrawCour"}
   "synopsis": "...",
   "tags": ["data"],
   "level": 200

If you hail from the relational database world, this type of embedded, denormalized document might seem a bit jarring!

Turns out: There's quite a bit to consider when modeling documents, especially when the intent is to store and query them in a database. For example:

  • Embedding vs referencing (yep, you can still reference data in other documents)
  • Normalization vs denormalization
  • Homogeneous vs heterogeneous data

In this short (30 minute) talk, Ryan and I dive into these specific challenges. We also walk through some real-world use cases we've helped our partners solve, such as:

  • Hierarchical data
  • Keyword / tag searching
  • Telemetry
  • Logging

NoSQL Now! talk summary and links: Polyglot Persistence in Windows Azure

Thanks to those who attended my session on Polyglot Persistence in Windows Azure today at the NoSQL Now!conference in San Jose. Here are slides and a few notes from today's session and follow-on questions.

 (download )
How will Windows Azure provide NoSQL database support? Today, Azure offers Table Storage as a NoSQL key/value store. Additionally, several partners have begun offering database "as a service" running in Windows Azure. For example, MongoLab and MongoHQ provide Azure-hosted MongoDB, while Cloudant provides Azure-hosted CouchDB.

Some of these databases are available directly through our partners' web portals, while others have also integrated into the Azure Store as part of the Windows Azure portal. Here's an example of MongoLab's MongoDB integrated in the store:
For self-hosting, several partners have built virtual machine images, installable via Azure's VM Depot. Here's Neo Technology's Neo4j 1.8: What, exactly, is VM Depot? VM Depot is a repository of community-created Linux-based virtual machine images. In terms of NoSQL, there are a few NoSQL database images available today. For example, you'll find Neo4j, MongoDB, Redis, and Riak.

What does VM Depot cost? VM Depot is free: Free to publish images and free to download images to your Azure account.

What are the architectural considerations for integrating multiple NoSQL databases in my app? Are there standard practices? As the Cloud Ninja Polyglot Persistence project demonstrates, you can choose to either make direct database calls or implement an abstraction layer, implementing such patterns as repository. When going with a repository pattern, this allows you to swap out database engines with reduced impact to your existing code base, although it's possible you'll need to make adjustments to your app's data access API.

How do I choose a specific NoSQL database implementation? Can you please recommend one? For key/value storage, Azure Table Storage offers massive scale (200TB per namespace) and provides very fast storage and lookup. As for 3rd-party vendor offerings, I really cannot give specific recommendations, but I can offer some food for thought when making your decision:

  • Look at the company's longevity, financials, funding, etc.
  • Does the vendor provide Professional Services support?
  • How big / popular is the community? Consider forums, web presence, conferences, etc.
  • How robust is language support? does the product offer direct API's when using a non-supported language?
  • How active is the project? Are there frequent updates? Can you view the code (e.g. OSS)?
  • Will the database engine run on your target OS? Some databases may be Windows-only or Linux-only.

What was that super-cool zooming app you used during your demo??? I was using , written by Mark Russinovich.

Where can I find more information about the stuff you talked about today? Here are some informational links from today's talk:

There are a few more resources we didn't talk about, but should still be valuable:Zoomit

Azure Open Platform video series - Episode 1: Open Compute Platform

Last year, my coworker and I visited several cities worldwide, delivering an all-day Windows Azure Open Platform Summit. This one-day event covered compute+networking, data, and the developer story from an open source perspective. This included several languages (.net, python, php, and node.js) as well as several NoSQL databases (Azure Table Storage, MongoDB, Cassandra, and Neo4j).

A few months later, we decided to record a 6-part video series covering the highlights of these topics. Each episode runs about 15 minutes. The first two segments were just published, and I'll update this post as the rest of the series comes online.

  • Open Compute Platform: PaaS, IaaS, and Virtual Networks
  • Open Compute Platform: Connectivity; Web Sites

Episode 1, Part 2

Upcoming talk: Polyglot Persistence, May 7

On May 7, I'll be speaking at the monthly Central Maryland Association of .NET Professionals (CMAP) meeting. Our topic will be Polyglot Persistence. What's that all about???

Picture this: You're working on a storage problem, wondering how you're going to shoe-horn something into your database. Maybe it's SQL Server. Maybe it's MongoDB or some other NoSQL variant. No matter which database option you choose, there always seem to be situations where data simply doesn't fit right, and it becomes more of a code exercise than a storage exercise. In this talk, we'll eschew the single-database tradition and look at a new approach gathering steam: Polyglot persistence, which simply means using multiple data storage mechanisms based on particular needs of your application. While polyglot persistence certainly includes both SQL and NoSQL variants (or even NewSQL), this demo-centric talk will cover NoSQL specifically.We'll look at (and demo) four fundamental NoSQL types: Key/Value, Document, Column Family, and Graph, and see where their sweet-spots are. We'll also work through a mock architecture on the whiteboard and see an example of how multiple databases could be combined in the real world.

If you're in the Columbia, MD area, feel free to come on out, grab some pizza, and (hopefully) enjoy the talk! More info about CMAP may be found on their website