As data science goes mainstream, so does its language

As data science goes mainstream, so does its language

Python may be the second choice to R, but its popularity and ease of use positions it to dominate data science.

Credit: Dreamstime

“When [Netflix’s data science team] started, there was one single kind of data scientist,” said Christine Doig, director of innovation for personalised experiences at Netflix. “Now the role has been integrated into the organisation.”

This isn’t just a Netflix thing. Across all industries, enterprises are embracing data science to craft personalised, engaging experiences, optimise pricing, and more. As they do so, they’re expanding the use of data science into product management, marketing, and other areas.

This is why the language that organisations use to decipher their data will increasingly be Python, not R. As organisations look to a more diverse group to help with data science, Python’s mass appeal makes for an easy on-ramp.

R or Python?

Historically, if you wanted to do data science, you needed to know R. As detailed on the R project’s site, “R is an integrated suite of software facilities for data manipulation, calculation, and graphical display.” It’s not really a programming language, per se, but includes one. 

Originally built for statistical and numerical analysis, R has remained true to those roots and remains an excellent tool, particularly for statisticians in their role as data scientists. This strength can also be a weakness, given the spread of data science well beyond the area of statistical analysis.

It’s true, as Sheetal Kalburgi, associate product manager at Anaconda, points out, that “data scientists are more technical and statistical” and often are “responsible for tasks like developing complex statistical algorithms that communicate product performance, predict outcomes, design experiments such as A/B testing, and optimise computational operations, to name a few.” 

But they also tend to be well versed in programming, which is where your average data scientist is much more likely to have a programming background than a hard-core statistics background.

Even if a company’s business problem centres on statistics, it’s still often going to be the case that Python will prove superior, if only because of familiarity. 

As Van Lindberg, general counsel for the Python Software Foundation told me, “Python is the second-best language for everything. R may be the best for stats, but Python is the second … and the second-best for [machine learning], web services, shell tools, and (insert use case here). If you want to do more than just stats, then Python’s breadth is an overwhelming win.”

No one really wants the silver medal instead of gold, but in this case, second place means Python will make itself useful for a much broader array of use cases. As Peter Wang, CEO of Anaconda, said in an interview, “Python had a broader scope from the beginning.” Engineering and science DNA is “baked into the Python core.” It’s therefore going to be the right answer much more often than R.

Python swallows data science

That’s not a criticism of R so much as a recognition of the momentum and mass Python has going for it. According to a recent SlashData survey of more than 20,000 developers, Python is a developer darling, coming in second only to JavaScript in terms of popularity. 

Part of this stems from the huge community around Python that extends Python’s utility into all sorts of domains (deep learning, artificial intelligence, and more) while fine-tuning it in key areas to improve performance. It’s increasingly difficult to find any areas where Python isn’t pushing to be the first-choice option, not merely “second best,” to use Lindberg’s phrasing.

Part of Python’s popularity stems simply from how easy it is to use. Given that enterprises are desperately trying to find data science talent, the easiest path is to mint existing employees. Even those without an engineering background find it easy to embrace Python’s simple syntax and readability and appreciate how useful it is for quick prototyping.

Lately, Python's ease of use has gotten even easier as Anaconda released PyScript, which makes Python more accessible to front-end developers by making it possible to write Python in HTML to build web applications. This is just one more innovation in a long string of innovations in the Python community to expand the breadth and depth of what developers and data scientists can do with Python.

Those innovations, and the Python community that benefits from them, increasingly make the decision to use Python that much easier. 

For areas where R or another alternative might be first choice, Wang suggests Python’s history as a great glue language means that “maybe someone will build a nice Python wrapper to expose a thin shim to expose some R capabilities” or otherwise make it easy for a data scientist to build with Python while adding complements from other communities, like R.

All this helps explain why Python looks set to help drive the next decade of data science, given how robust it is for experienced data scientists and less-experienced aspirants.

Follow Us

Join the newsletter!


Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

Tags Data science



How MSPs can capitalise on integrating AI into existing services

How MSPs can capitalise on integrating AI into existing services

​Given the pace of change, scale of digitalisation and evolution of generative AI, partners must get ahead of the trends to capture the best use of innovative AI solutions to develop new service opportunities. For MSPs, integrating AI capabilities into existing service portfolios can unlock enhancements in key areas including managed hosting, cloud computing and data centre management. This exclusive Reseller News roundtable in association with rhipe, a Crayon company and VMware, focused on how partners can integrate generative AI solutions into existing service offerings and unlocking new revenue streams.

How MSPs can capitalise on integrating AI into existing services
Access4 holds inaugural A/NZ Annual Conference

Access4 holds inaugural A/NZ Annual Conference

​Access4 held its inaugural Annual Conference in Port Douglass, Queensland, for Australia and New Zealand from 9-11 October, hosting partners from across the region with presentations on Access4 product updates, its 2023 Partner of the Year awards and more.

Access4 holds inaugural A/NZ Annual Conference
Show Comments