Menu
Big data gets a new open-source project: Apache Arrow

Big data gets a new open-source project: Apache Arrow

It offers performance improvements of more than 100x on analytical workloads, the foundation says

Hadoop, Spark and Kafka have already had a defining influence on the world of big data, and now there's yet another Apache project with the potential to shape the landscape even further: Apache Arrow.

The Apache Software Foundation on Wednesday launched Arrow as a top-level project designed to provide a high-performance data layer for columnar in-memory analytics across disparate systems.

Based on code from the related Apache Drill project, Apache Arrow can bring benefits including performance improvements of more than 100x on analytical workloads, the foundation said. In general, it enables multi-system workloads by eliminating cross-system communication overhead.

Code committers to the project include developers from other Apache big-data projects such as Calcite, Cassandra, Drill, Hadoop, HBase, Impala, Kudu, Parquet, Phoenix, Spark and Storm.

"The open-source community has joined forces on Apache Arrow," said Jacques Nadeau, vice president of the new project as well as Apache Drill. "We anticipate the majority of the world's data will be processed through Arrow within the next few years."

In many workloads, between 70 percent and 80 percent of CPU cycles are spent serializing and deserializing data. Arrow alleviates that burden by enabling data to be shared among systems and processed with no serialization, deserialization or memory copies, the foundation said.

"An industry-standard columnar in-memory data layer enables users to combine multiple systems, applications and programming languages in a single workload without the usual overhead," said Ted Dunning, vice president of the Apache Incubator and member of the Apache Arrow Project Management Committee.

Arrow also supports complex data with dynamic schemas in addition to traditional relational data. For instance, it can handle JSON data, which is commonly used in Internet-of-Things (IoT) workloads, modern applications and log files. Implementations are also available for a number of programming languages for greater interoperability.

Apache Arrow software is available under the Apache License v2.0 and is overseen by a self-selected team of active contributors to the project.


Follow Us

Join the New Zealand Reseller News newsletter!

Error: Please check your email address.

Featured

Slideshows

Tight lines as Hooked on Lenovo catches up at Great Barrier Island

Tight lines as Hooked on Lenovo catches up at Great Barrier Island

​Ingram Micro’s Hooked on Lenovo incentive programme recently rewarded 28 of New Zealand's top performing resellers with a full-on fishing trip at Great Barrier Island for the third year​ in a row.

Tight lines as Hooked on Lenovo catches up at Great Barrier Island
Inside the AWS Summit in Sydney

Inside the AWS Summit in Sydney

As the dust settles on the 2017 AWS Summit in Sydney, ARN looks back an action packed two-day event, covering global keynote presentations, 80 breakout sessions on the latest technology solutions, and channel focused tracks involving local cloud stories and insights.

Inside the AWS Summit in Sydney
Channel tees off on the North Shore as Ingram Micro hosts annual Cure Kids Charity golf day

Channel tees off on the North Shore as Ingram Micro hosts annual Cure Kids Charity golf day

Ingram Micro hosted its third annual Cure Kids Charity Golf Tournament at the North Shore Golf Club in Auckland. In total, 131 resellers, vendors and Ingram Micro suppliers enjoyed a round of golf consisting of challenges on each of the 18 sponsored holes, with Team Philips taking out the top honours.

Channel tees off on the North Shore as Ingram Micro hosts annual Cure Kids Charity golf day
Show Comments