December 04, 2018

A Complete Walkthrough of Azure Cosmos DB and Why Should You Use It

202 Views
A Complete Walkthrough of Azure Cosmos DB and Why Should You Use It

Azure Cosmos DB is Microsoft’s fully managed, globally distributed and horizontally scalable cloud service. It’s a multi-model NoSQL database that provides independent scaling across all the Azure regions. Additionally, Azure Cosmos DB has an extensive tooling and API support for different programming paradigms. Making it easier for users who have an existing NoSQL / Cloud database workload that they would like to move to Azure Cosmos DB.

This post is aimed at exploring all those features of Azure Cosmos DB that makes it a compelling proposition for your business.

NoSQL Databases

NoSQL databases now have been around for quite some time. But unlike the term relational database, it’s essentially an umbrella term that encompasses different technologies and formats to store and retrieve data. The primary reasons for preferring NoSQL database for storing your data are –

  • Rapidly changing data types: Data is now generated and stored in different data formats such as structured, unstructured and semi-structured data types. Traditional data stores only support storing data in structured formats. NoSQL provides efficient storage and query capabilities for unstructured and semi-structured data.
  • Schema Constraints: Relational data stores enforce rigid schemas for storing and managing data. Often that makes schema end up being a constraint to how quickly an application adapts to the changing business needs. NoSQL databases typically don’t require schema while storing data. However, that doesn’t mean there’s no schema. You can associate a schema with the data during data retrieval. This means that your application is not locked into a schema and thus it can easily adapt to changing application needs.
  • Performance and Scalability: With huge volumes of data that are being processed at a scale in different applications, relational data stores are unable to keep up with those size of loads. NoSQL data stores, on the other hand, provide capabilities of scaling out, replication and horizontal partitioning that enables businesses to provide high throughput and low latency along with high availability.

As mentioned previously, NoSQL includes databases built-on different models and technologies. And these databases can be broadly grouped under certain categories. The major categories of NoSQL databases among these are –

  • Columnar: Data is stored in groups of column families that are often accessed together. An instance of data can have any number of columns and these columns are grouped or aggregated as required for data retrieval. Examples – HBase, Cassandra, Amazon DynamoDB and Google BigTable
 
  • Key-Value: Data is represented as a combination of a unique attribute (key) and its related content (value). The application accessing the data is responsible for applying appropriate context (schema) to stored data. Examples – Redis, Riak, Berkeley DB, Couchbase and MemcacheDB
  • Document: A document, equivalent to rows in the relational database is a complex self-contained hierarchical data structures that contain key-values pairs or nested documents. A document is typically formatted in XML, JSON or BSON and is typically stored together in a collection. Examples – MongoDB, CouchDB, IBM Domino and DocumentDB (the precursor to Azure Cosmos DB).
  • Graph: Data is stored in a Graph database as a network (graph) of entities and relationships. The interpretation of the data is based on the relationship between different entities. So typical data retrieval requires fast traversal through the network to get the desired entities. Examples – Neo4j, OrientDB, and FlockDB

Azure Cosmos DB Features

In 2014, Microsoft introduced its first cloud-based NoSQL database called DocumentDB that provided low latency and high output. As the name suggests, it was a document-oriented NoSQL database that offered SQL like querying interface for retrieving the document data. Azure Cosmos DB is a progression of DocumentDB which was introduced in 2017. In addition to the existing DocumentDB capabilities – Microsoft added a lot more feature that made Azure Cosmos DB truly flexible, scalable and globally distributed cloud-based NoSQL database service.

Let’s look at some of these key features:

Global Distribution

Azure Cosmos DB is Azure Foundational (Ring 0) Service and hence its available in every location where Azure is available by default. So, you can setup instances of your Cosmos DB at any location that you want simply by activating the desired location from the Azure portal. This will ensure that your data is replicated and available for your users in the region with guaranteed low latency. Additionally, Cosmos DB provides automatic and manual failover that enables high availability and disaster recovery.

Performance

Performance in any application is typically measured through latency and output. With its global distribution, replication and failover options, Cosmos DB ensures that your customers continue to access their data with faster response time, no matter where they are. Cosmos DB also provides guaranteed throughput based on the provisioned output capacity. You can control this throughput at the database level or at the container level.

Pricing

Azure Cosmos DB pricing model is dependent on the required throughput and the storage necessary for your data. Under this model you reserve a capacity of output and storage based on your estimates and scale the throughput and storage independently, elastically and globally, to suit your application requirements. This ensures that you can get desirable performance and cost for your applications depending on the expected performance and data storage needs.

Multi-Model and Multi APIs

Azure Cosmos DB is a multi-model database that provides support to multiple data models through a single integrated platform. As of now – Azure Cosmos DB enables you to create containers that can store data in Key-Value, Columnar, Document or Graph data stores. Along with multi-model, Cosmos DB also provides users the flexibility to choose from a variety of familiar APIs to access the data such as -

  • SQL API or MongoDB API (for Document databases)
  • Table API (Key-Value databases
  • Cassandra API (Columnar databases)
  • Gremlin API (Graph databases)

With support for different data models and APIs, Azure Cosmos DB makes it very easy to store their data in the format, best suited for your application and query, by using the tools that you may be already familiar with.

5 Well-defined Consistency Levels

Azure Cosmos DB allows you to choose a consistency level that strikes a balance between latency, throughput, and availability that’s appropriate for your needs. The different levels of consistency offered are -

  • Strong Consistency: Ensures consistency across all nodes, in all regions, but this comes at the cost of overall performance.
  • Bounded Staleness Consistency: Provides a mean to set the level of freshness of data. Although this is still a strong consistency depending on the level of freshness that you choose. Dirty reads are possible.
  • Session Consistency: Ensures that there are no dirty reads for the writer but it’s possible to have dirty reads for other users. This is the default consistency level for Azure Cosmos DB.
  • Consistent Prefix: Ensures that the read data has been updated to all replicas. Under this level, the reads never see out-of-order writes.
  • Eventual Consistency: Provides no guarantees on the freshness of the data or on the order. However, this provides the fastest performance.

Tooling

In addition to the different APIs that you can store or query data in Azure Cosmos DB, you can also programmatically call these APIs using languages such as Java, .NET. Python, JavaScript and Go. Microsoft also provides strong tooling support around Cosmos DB that helps simplify a lot of operations. Some of the tools include –

  • dtui.exe and dt.exe: These are GUI and command line tools that help you to migrate your data from different sources such as JSON, BSON, SQL Server, MongoDB, DynamoDB, HBase, CSV and Blobs into Azure Cosmos DB.

    This migration tool can be downloaded from GitHub or you can directly download a the pre-compiled binary.
  • Azure Cosmos DB Emulator: As the name suggests, this tool provides a local environment that emulates Cosmos DB service so that you can use it for developing and testing needs without incurring any costs. Once you are satisfied with your results with the Cosmos DB emulator, you can deploy the data to the Cosmos DB instance in Azure from the emulator.

    The emulator can be downloaded from the this location in the Microsoft Download Center.
  • Data Explorer to Azure Cosmos DB Explorer: This is a standalone web-based tool that provides a one-stop interface to manage your Cosmos DB data. Apart from data management, the Azure Cosmos DB Explorer also provide temporary or permanent access to other users to the data in your containers who may not be able to access it through the Azure portal. It can also be used to share the results of your query with other users.

    To access the Azure Cosmos DB Explorer – go to https://cosmos.azure.com. You will need your account connection strings to be able to connect your database instance.
  • Capacity Planner: It’s a handy tool that provides you with a quick estimate of the approx. Request Units (RUs) that you will need for your planned workload. The capacity planner will help you fine-tune your throughput and storage estimate. And based on the estimated RUs for your requirements, you can then select the appropriate pricing model from the Azure portal for your containers.

    Click here to go to the capacity planner for Azure Cosmos DB

Azure Cosmos DB Usage Scenarios

Azure Cosmos DB is suitable for any high-performance application that requires global scale. It is specifically designed to handle applications that require low response times with massive amounts of reads and writes. Some of the cases where it makes a great fit are:

  • Globally Distributed Applications: Businesses that need to provide low latency data access to users at a massive scale over geographies and ensure high availability and disaster recovery across multiple data centers/regions.
  • IT and Telemetry Applications: Infrastructure to support ingestion of huge volumes of disparate data from many devices.
  • E-Commerce Platforms: Websites that need to scale elastically to handle seasonal traffic such as the Super Bowl or Black Friday
  • Recommendation/Classification Engines: Applications that collect customer data such as interests, browsing history, buying patterns and uses machine learning models to quickly provide predictive insights on customer behavior.
  • Operational Logging and Analytics: Applications that store and analyze huge volumes of log data and other associated data at a scale to provide operational insights quickly and accurately.
  • Gaming Applications: Applications that need to support sudden spurts in usage, along with super low latency required to provide an optimal gaming user experience.
  • Social Media Applications: Applications that run on a global scale and have unpredictable usage loads such as tweets, blog or image posts, comments or chat sessions.

Wrapping Up

As we discussed above, Azure Cosmos DB offers a wide range of features that make it easy and cost-effective. It also adds up the provision of data storage for your workloads that are globally distributed and provides guaranteed throughput and low latency. If you have a need to store and process planet-scale data in a NoSQL data store, with all its benefits Azure Cosmos DB should be your first choice to build the infrastructure.

In the next post – we will look at some of the design considerations involved while designing in Azure Cosmos DB database. Till then stay tuned.

However, if you are trying to import data from other databases to Cosmos DB, Netwoven can help! As an Elite Gold certified Microsoft Partner, we can help you develop, deploy or monitor apps in Azure as per your business needs.

Leave a Reply

Your email address will not be published. Required fields are marked *