.githubUpdate CONTRIBUTING.mdJason Dai19 days ago
pysparkadd containeryangw2 days ago
scriptschange-spark-version should not change the parent pomYiheng Wang23 days ago
sparkupdate biRecurrent and add adam optim method in python apiyangw2 days ago
.gitignoreadd dependency-reduced-pom.xml for shadeto gitignoreYiheng Wang27 days ago
.gitmodulesadd core submodule and remove nativeLu Qi1 month ago
LICENSEFix several typos; add LICENSE and a few other minor adjustmentsSean Owen2 months ago
README.mdUpdate README.mdJason Dai19 days ago
coreadd recurrent dropout doc to python codeYao16 days ago
make-dist.shuse maven package instead of script to copy dist artifact togetherDongjie Shi9 days ago
pom.xmluse maven package instead of script to copy dist artifact togetherDongjie Shi9 days ago
scalastyle_config.xmlsimplify file headerzhangli1 month ago

BigDL: Distributed Deep Learning on Apache Spark

What is BigDL?

BigDL is a distributed deep learning library for Apache Spark; with BigDL, users can write their deep learning applications as standard Spark programs, which can directly run on top of existing Spark or Hadoop clusters.

  • Rich deep learning support. Modeled after Torch, BigDL provides comprehensive support for deep learning, including numeric computing (via Tensor) and high level neural networks; in addition, users can load pre-trained Caffe or Torch models into Spark programs using BigDL.

  • Extremely high performance. To achieve high performance, BigDL uses Intel MKL and multi-threaded programming in each Spark task. Consequently, it is orders of magnitude faster than out-of-box open source Caffe, Torch or TensorFlow on a single-node Xeon (i.e., comparable with mainstream GPU).

  • Efficiently scale-out. BigDL can efficiently scale out to perform data analytics at "Big Data scale", by leveraging Apache Spark (a lightning fast distributed data processing framework), as well as efficient implementations of synchronous SGD and all-reduce communications on Spark.

Why BigDL?

You may want to write your deep learning programs using BigDL if:

  • You want to analyze a large amount of data on the same Big Data (Hadoop/Spark) cluster where the data are stored (in, say, HDFS, HBase, Hive, etc.).

  • You want to add deep learning functionalities (either training or prediction) to your Big Data (Spark) programs and/or workflow.

  • You want to leverage existing Hadoop/Spark clusters to run your deep learning applications, which can be then dynamically shared with other workloads (e.g., ETL, data warehouse, feature engineering, classical machine learning, graph analytics, etc.)

How to use BigDL?

  • To learn how to install and build BigDL (on both Linux and macOS), you can check out the Build Page
  • To learn how to run BigDL programs (as either a local Java program or a Spark program), you can check out the Getting Started Page
  • To learn the details of Python support in BigDL, you can check out the Python Support Page
  • To try BigDL out on EC2, you can check out the Running on EC2 Page
  • To learn how to create practical neural networks using BigDL in a couple of minutes, you can check out the Tutorials Page
  • For more details, you can check out the Documents Page (including Tutorials, Examples, Programming Guide, etc.)

Support

About FluentSend Feedback