Relational databases store the vast majority of web application persistent data. However, there are several alternative classifications of storage representations.
These persistent data storage representations are commonly used to augment, rather than completely replace, relational databases. The underlying persistence type used by the NoSQL database often gives it different performance characteristics than a relational database, with better results on some types of read/writes and worse performance on others.
Key-value pair data stores are based on hash map data structures.
Redis is an open source in-memory key-value pair data store. Redis is often called "the Swiss Army Knife of web application development." It can be used for caching, queuing, and storing session data for faster access than a traditional relational database, among many other use cases. Learn more on the Redis page.
Memcached is another widely used in-memory key-value pair storage system.
How to Use Redis with Python 3 and redis-py on Ubuntu 16.04 contains detailed steps to install and start using Redis in Python.
"How To Install and Use Redis" is a guide for getting up with the extremely useful in-memory data store.
This video on Scaling Redis at Twitter is a detailed look behind the scenes with a massive Redis deployment.
Walrus is a higher-level Python wrapper for Redis with some caching, querying and data structure components build into the library.
Real World Redis Tips
provides some guidance from Heroku's engineers from deploying Redis at
scale. The tips include setting an explicit idle connection timeout,
using a connection pooler and avoiding using KEYS
in favor of SCAN
.
Writing Redis in Python with Asyncio shows a detailed example for how to use the new Asyncio standard library in Python 3.4+ for working with Redis.
How to collect Redis metrics shows how to use the Redis CLI client to grab key metrics on latency.
You should revise your Redis max connections setting
is a retrospective from a hard web application failure due to Redis
connections maxing out on Heroku, and how to avoid this in your own
applications by modifying your redis.conf
settings.
A document-oriented database provides a semi-structured representation for nested data.
MongoDB is an open source document-oriented data store with a Binary Object Notation (BSON) storage format that is JSON-style and familiar to web developers. PyMongo is a commonly used client for interfacing with one or more MongoDB instances through Python code. MongoEngine is a Python ORM specifically written for MongoDB that is built on top of PyMongo.
Riak is an open source distributed data store focused on availability, fault tolerance and large scale deployments.
Apache CouchDB is also an open source project where the focus is on embracing RESTful-style HTTP access for working with stored JSON data.
The creator and maintainers of PyMongo review four decisions they regret from building the widely-used Python MongoDB driver.
The Python and MongoDB Talk Python to Me podcast has a great interview with the maintainer of the Python driver for MongoDB.
MongoDB queries don’t always return all matching documents! is a walkthrough of discovering how MongoDB queries actually work, and shows some potential pitfalls of relying on technologies where you do not fully understand how they operate.
Introduction to MongoDB and Python shows how to use Python to interface with MongoDB via PyMongo and MongoEngine.
A column-family table class of NoSQL data stores builds on the key-value pair type. Each key-value pair is considered a row in the store while the column family is similar to a table in the relational database model.
Apache HBase
A graph database represents and stores data in three aspects: nodes, edges and properties.
A node is an entity, such as a person or business.
An edge is the relationship between two entities. For example, an edge could represent that a node for a person entity is an employee of a business entity.
A property represents information about nodes. For example, an entity representing a person could have a property of "female" or "male".
Neo4j is one of the most widely used graph databases and runs on the Java Virtual Machine stack.
Cayley is an open source graph data store written by Google primarily written in Go.
Titan is a distributed graph database built for multi-node clusters.
Introduction to Graph Databases covers trends in NoSQL data stores and compares graph databases to other data store types.
Graph search algorithm basics explains the methods for searching nodes for data in a graph database.
NoSQL databases: an overview explains what NoSQL means, how data is stored differently than in relational systems and what the Consistency, Availability and Partition-Tolerance (CAP) Theorem means.
NoSQL Explained is a good high-level overview of considerations and features when choosing a type of NoSQL database compared to a relational database.
CAP Theorem overview presents the basic constraints all databases must trade off in operation.
This post on What is a NoSQL database? Learn By Writing One in Python is a detailed article that breaks the mystique behind what some forms of NoSQL databases are doing under the covers.
The CAP Theorem series explains concepts related to NoSQL such as what is ACID compared to CAP, CP versus CA and high availability in large scale deployments.
NoSQL Weekly is a free curated email newsletter that aggregates articles, tutorials, and videos about non-relational data stores.
NoSQL comparison is a large list of popular, BigTable-based, special purpose, and other datastores with attributes and the best use cases for each one.
Relational databases such as MySQL and PostgreSQL have added features in more recent versions that mimic some of the capabilities of NoSQL data stores. For example, check out this blog post on storing JSON data in PostgreSQL.
Understand why NoSQL data stores are better for some use cases than relational databases. In general these benefits are only seen at large scale so they may not be applicable to your web application.
Integrate Redis into your project for a speed boost over slower persistent storage. Storing session data in memory is generally much faster than saving that data in a traditional relational database that uses persistent storage. Note that when memory is flushed the data goes away so anything that needs to be persistent must still be backed up to disk on a regular basis.
Evaluate other use cases such as storing transient logs in a document-oriented data store such as MongoDB.
Fix errors in your Python code before your users see them by monitoring with Rollbar.
Deploy web apps with the Ansible configuration management tool.
Build microservices with Docker, Flask & React in this great course.