MongoDB, the company behind the popular, document-oriented NoSQL database, has rolled out MongoDB 4.4 in public beta, with new features and improvements intended to bolster the database’s ability to work reliably at scale.
MongoDB has long had mechanisms for scaling out by way of sharding, or distributing data across multiple nodes. Documents were associated with a specific shard, or node, by way of a shard key. Because the shard key of a document could not be changed after assignment, every document stayed on a given shard for life, which made it difficult to rebalance shards as the contents of MongoDB database evolved.
To ease rebalancing, MongoDB 4.4 introduces “refinable shard keys,” which allow documents’ shard keys to be changed so the documents can be relocated to different shards. Using refinable shard keys, documents that belong together on a given shard can be brought together as requirements change, and documents can be dynamically or programmatically rebalanced over time to better match evolving access patterns.
Aggregations in MongoDB, essentially queries, gain several new capabilities in MongoDB 4.4. Unions allow data from different datasets within a MongoDB collection to be aggregated in queries. This way, the data doesn’t have to be treated through a separate ETL (extract, transform, and load) step; it can be consolidated in place, on the server, and returned to the client without needing mutiple round trips to obtain the complete result set.
Custom aggregation expressions, such as the
$function aggregator, now allow more complex aggregations to be executed server-side, again to keep the processing closer to the data. This is essentially a version of stored procedures, something long featured in conventional relational databases but appearing in MongoDB for the first time. However, there is a performance impact associated with using
$function, so it’s recommended only when the other built-in aggregation expressions aren’t enough.
Other new features improve how MongoDB handles reading from nodes and satisfying requests. The “hedged reads” feature takes incoming read requests, routes them to all nodes capable of satisfying the request, and serves the request with the fastest response. In the same vein are “mirrored reads,” where the caches for secondary replicas are pre-loaded whenever the server restarts, to reduce the latency of populating those caches.