Scala for Data Science

Pascal Bugnion / Scala / Data Science / Data Engineering

Data science is fashionable. Startups are sprouting across the globe and established companies are scrambling to assemble data science teams. The rapid increase in popularity of data science is accompanied by increased ambition: data science teams are expected to build increasingly complex applications that can scale to larger datasets, deal with dirtier data and use more sophisticated learning algorithms.

Scala is a powerful language for building robust data science applications. By emphasising immutability and functional constructs, Scala lends itself to the construction of robust libraries for concurrency and big data analysis. A rich ecosystem of tools for data science has therefore developed around Scala, including libraries for accessing SQL and NoSQL databases, frameworks for building distributed applications like Apache Spark and libraries for linear algebra and numerical algorithms.

This rich ecosystem has fostered the growth of an extensive, active community around Scala and data science. This community, coupled with the absence of legacy libraries, ensures that open source projects grow quickly, that issues are resolved rapidly, and that there is a wealth of documentation and tutorials covering common use cases. By combining the best elements of functional programming with a healthy pragmatism, Scala makes programming fun.

Scala for data science introduces the core libraries necessary for building data science applications. The book covers interacting with SQL and NoSQL databases, tools for concurrency and parallelisation, building distributed data analysis systems with Apache Spark and developing RESTful APIs with the Play framework.

For the table of contents and a sample chapter describing how to build parallel cross-validation pipelines, head over to the publisher's [website]

Share on: Twitter, Facebook, LinkedIn or Google+