{"id":741,"date":"2015-02-08T21:56:36","date_gmt":"2015-02-09T02:56:36","guid":{"rendered":"http:\/\/shirishranjit.com\/blog1\/?page_id=741"},"modified":"2015-03-19T07:15:30","modified_gmt":"2015-03-19T11:15:30","slug":"apache-spark-vs-apache-strom","status":"publish","type":"page","link":"https:\/\/shirishranjit.com\/blog1\/big-data\/apache-and-bigdata\/apache-spark-vs-apache-strom","title":{"rendered":"Apache Spark vs Apache Strom &#8211; which one to pick"},"content":{"rendered":"<h2>Key Differences based on Technical Requirements<\/h2>\n<ul>\n<li>Latency: Is the performance of the streaming application critical to application? Storm can give sub-second latency much more easily and with less restrictions than Spark Streaming.<\/li>\n<li>Development Cost: Do you required to to have similar code bases for batch processing <em>and<\/em> stream processing? With Spark, batching and streaming are <em>very<\/em> similar. Storm, however, departs dramatically from the MapReduce paradigm.<\/li>\n<li>Message Delivery Guarantees: Do you need &#8220;Guarantee&#8221; delivery of <em>every<\/em> single record, or is some nominal amount of data loss acceptable? Disregarding everything else, Spark trivially yields perfect, exactly once message delivery. Storm can provide all three delivery semantics, but getting perfect exactly once message delivery requires more effort to properyly achieve.<\/li>\n<li>Fault Tolerance: Do your process must have Fault Tolerance? Both systems actually handle fault-tolerance of this kind really well and in relatively similar ways.\n<ul>\n<li>Production Storm clusters will run Storm processes under <a href=\"http:\/\/en.wikipedia.org\/wiki\/Process_supervision\">supervision<\/a>; if a process fails, the supervisor process will restart it automatically. State management is handled through ZooKeeper. Processes restarting will reread the state from ZooKeeper on an attempt to rejoin the cluster.<\/li>\n<li>Spark handles restarting workers via the resource manager: YARN, Mesos, or its standalone manager. Spark\u2019s standalone resource manager handles master node failure with standby-masters and ZooKeeper. Or, this can be handled more primatively with just local filesystem state checkpointing, not typically recommended for production environments.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>Both Apache Spark Streaming and Apache Storm are great solutions that solve the streaming ingestion and transformation problem. Either system can be a great choice for part of an analytics stack.<\/p>\n<h2 id=\"references\">References<\/h2>\n<ul>\n<li><a href=\"http:\/\/storm.incubator.apache.org\/documentation\/Home.html\">Apache Storm Home Page<\/a><\/li>\n<\/ul>\n<ul>\n<li><a href=\"http:\/\/spark.apache.org\">Apache Spark<\/a><\/li>\n<\/ul>\n<ul>\n<li><a href=\"http:\/\/www.zdatainc.com\/2014\/07\/real-time-streaming-apache-storm-apache-kafka\/\">Real Time Streaming with Apache Storm and Apache Kafka<\/a><\/li>\n<\/ul>\n<ul>\n<li><a href=\"http:\/\/www.zdatainc.com\/2014\/08\/real-time-streaming-apache-spark-streaming\/\">Real Time Streaming with Apache Spark (Streaming)<\/a><\/li>\n<\/ul>\n<ul>\n<li><a href=\"http:\/\/kafka.apache.org\/\">Apache Kafka<\/a><\/li>\n<\/ul>\n<ul>\n<li><a href=\"http:\/\/en.wikipedia.org\/wiki\/Data_parallelism\">Wikipedia: Data Parallelism<\/a><\/li>\n<\/ul>\n<ul>\n<li><a href=\"http:\/\/en.wikipedia.org\/wiki\/Task_parallelism\">Wikipedia: Task Parallelism<\/a><\/li>\n<\/ul>\n<ul>\n<li><a href=\"http:\/\/zookeeper.apache.org\">Apache ZooKeeper<\/a><\/li>\n<\/ul>\n<ul>\n<li><a href=\"https:\/\/github.com\/yahoo\/storm-yarn\">Yahoo! Storm-YARN<\/a><\/li>\n<\/ul>\n<ul>\n<li><a href=\"http:\/\/hortonworks.com\/kb\/storm-on-yarn-install-on-hdp2-beta-cluster\/\">Hortonworks: Storm on YARN<\/a><\/li>\n<\/ul>\n<ul>\n<li><a href=\"http:\/\/mesos.apache.org\">Apache Mesos<\/a><\/li>\n<\/ul>\n<ul>\n<li><a href=\"https:\/\/mesosphere.io\/learn\/run-storm-on-mesos\/\">Run Storm on Mesos<\/a><\/li>\n<\/ul>\n<ul>\n<li><a href=\"https:\/\/github.com\/mesosphere\/marathon\">Marathon<\/a><\/li>\n<\/ul>\n<ul>\n<li><a href=\"http:\/\/xinhstechblog.blogspot.com\/2014\/06\/storm-vs-spark-streaming-side-by-side.html\">Storm vs Spark Streaming: Side by Side<\/a><\/li>\n<\/ul>\n<ul>\n<li><a href=\"http:\/\/www.slideshare.net\/ptgoetz\/apache-storm-vs-spark-streaming\">Storm vs Spark Streaming (Slideshare)<\/a><\/li>\n<\/ul>\n<div class=\"twttr_buttons\"><div class=\"twttr_twitter\">\n\t\t\t\t\t<a href=\"http:\/\/twitter.com\/share?text=Apache+Spark+vs+Apache+Strom+-+which+one+to+pick\" class=\"twitter-share-button\" data-via=\"\" data-hashtags=\"\"  data-size=\"default\" data-url=\"https:\/\/shirishranjit.com\/blog1\/big-data\/apache-and-bigdata\/apache-spark-vs-apache-strom\"  data-related=\"\" target=\"_blank\">Tweet<\/a>\n\t\t\t\t<\/div><div class=\"twttr_followme\">\n\t\t\t\t\t\t<a href=\"https:\/\/twitter.com\/shiranjit\" class=\"twitter-follow-button\" data-size=\"default\"  data-show-screen-name=\"false\"  target=\"_blank\">Follow me<\/a>\n\t\t\t\t\t<\/div><\/div>","protected":false},"excerpt":{"rendered":"<p>Key Differences based on Technical Requirements Latency: Is the performance of the streaming application critical to application? Storm can give sub-second latency much more easily and with less restrictions than Spark Streaming. Development Cost: Do you required to to have &hellip; <a href=\"https:\/\/shirishranjit.com\/blog1\/big-data\/apache-and-bigdata\/apache-spark-vs-apache-strom\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":846,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-741","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/shirishranjit.com\/blog1\/wp-json\/wp\/v2\/pages\/741"}],"collection":[{"href":"https:\/\/shirishranjit.com\/blog1\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/shirishranjit.com\/blog1\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/shirishranjit.com\/blog1\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/shirishranjit.com\/blog1\/wp-json\/wp\/v2\/comments?post=741"}],"version-history":[{"count":8,"href":"https:\/\/shirishranjit.com\/blog1\/wp-json\/wp\/v2\/pages\/741\/revisions"}],"predecessor-version":[{"id":749,"href":"https:\/\/shirishranjit.com\/blog1\/wp-json\/wp\/v2\/pages\/741\/revisions\/749"}],"up":[{"embeddable":true,"href":"https:\/\/shirishranjit.com\/blog1\/wp-json\/wp\/v2\/pages\/846"}],"wp:attachment":[{"href":"https:\/\/shirishranjit.com\/blog1\/wp-json\/wp\/v2\/media?parent=741"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}