Menu
Databricks takes the human intervention out of Spark processing

Databricks takes the human intervention out of Spark processing

A new workflow feature for Databricks Cloud can automate routine deployment tasks

Databrick's now offers a way to schedule Spark jobs in the cloud

Databrick's now offers a way to schedule Spark jobs in the cloud

Databricks wants to make it possible to take humans out of the loop entirely when it comes to running complicated data analysis jobs.

The company, which offers a commercial version of Spark , now offers a tool to automate the setting up and executing of analysis written to run on the open source data processing platform.

"You can express very complicated workflows using this thing," said Ali Ghodsi, Databricks' director of engineering. "There is no human in the loop any more."

Founded by several of the original developers of Spark, Databricks offers a commercial version of the platform designed to run on Amazon Web Services and eliminate many of the mundane chores of setting up and maintaining an in-house deployment.

Spark can be used to analyze very large data sets across multiple servers for tasks such as generating recommendations for an Internet service for users, or to predict future revenue of a company.

As customers get more comfortable with using big data, they are increasingly scheduling their analysis jobs to run on a regular basis, requiring an administrator to log into a console to coordinate all the steps needed to run the job.

The new feature for Databricks Cloud, called jobs, provides a way for administrators to set up schedules to run standalone Spark jobs at specified intervals. A user could schedule a Spark application to run on a specific Databricks cloud cluster at a scheduled time. Users can decide whether to use a dedicated cluster for maximum performance, or a cluster shared with other users to save money.

The service notifies the user when the task completes. The service also creates a log detailing if the task was completed successfully or not, and can alert the administrator if something goes awry.

In effect, the feature establishes a way to create a production pipeline, which is a series of jobs that execute automatically and in coordination with each other. An administrator can set up a workflow that executes two Spark jobs at the same time, and wait for both to finish. When both are completed, the workflow can then start another job that uses the results from the first two. If one of the two initial jobs fail, then the entire workflow can be terminated.

Jobs are written in Spark notebooks. Similar to iPython notebooks for Python, Spark notebooks are user-generated packages that contain all the components needed to run an interactive data analysis job across a cluster. Spark Notebooks can be written in Python, Scala, SQL, or a combination of each.

Pricing for Databricks is tiered, based on usage capacity, support model, and feature-set. It will start at several hundred dollars per month.

Joab Jackson covers enterprise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is Joab_Jackson@idg.com


Follow Us

Join the newsletter!

Or

Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

Tags softwareapplicationsdata miningDatabricks

Featured

Slideshows

The making of an MSSP: a blueprint for growth in NZ

The making of an MSSP: a blueprint for growth in NZ

Partners are actively building out security practices and services to match, yet remain challenged by a lack of guidance in the market. This exclusive Reseller News Roundtable - in association with Sophos - assessed the making of an MSSP, outlining the blueprint for growth and how partners can differentiate in New Zealand.

The making of an MSSP: a blueprint for growth in NZ
Reseller News Platinum Club celebrates leading partners in 2018

Reseller News Platinum Club celebrates leading partners in 2018

The leading players of the New Zealand channel came together to celebrate a year of achievement at the inaugural Reseller News Platinum Club lunch in Auckland. Following the Reseller News Innovation Awards, Platinum Club provides a platform to showcase the top performing partners and start-ups of the past 12 months, with more than ​​50 organisations in the spotlight.​​​

Reseller News Platinum Club celebrates leading partners in 2018
Meet the top performing HP partners in NZ

Meet the top performing HP partners in NZ

HP has honoured its leading partners in New Zealand during 2018, following 12 months of growth through the local channel. Unveiled during the fourth running of the ceremony in Auckland, the awards recognise and celebrate excellence, growth, consistency and engagement of standout Kiwi partners.

Meet the top performing HP partners in NZ
Show Comments