Skip to content
Snippets Groups Projects
  1. Feb 12, 2015
    • Venkata Ramana Gollamudi's avatar
      [SPARK-5765][Examples]Fixed word split problem in run-example and compute-classpath · 629d0143
      Venkata Ramana Gollamudi authored
      Author: Venkata Ramana G <ramana.gollamudihuawei.com>
      
      Author: Venkata Ramana Gollamudi <ramana.gollamudi@huawei.com>
      
      Closes #4561 from gvramana/word_split and squashes the following commits:
      
      285c8d4 [Venkata Ramana Gollamudi] Fixed word split problem in run-example and compute-classpath
      629d0143
    • Katsunori Kanda's avatar
      [EC2] Update default Spark version to 1.2.1 · 9c807650
      Katsunori Kanda authored
      Author: Katsunori Kanda <potix2@gmail.com>
      
      Closes #4566 from potix2/ec2-update-version-1-2-1 and squashes the following commits:
      
      77e7840 [Katsunori Kanda] [EC2] Update default Spark version to 1.2.1
      9c807650
    • Kay Ousterhout's avatar
      [SPARK-5645] Added local read bytes/time to task metrics · 893d6fd7
      Kay Ousterhout authored
      ksakellis I stumbled on your JIRA for this yesterday; I know it's assigned to you but I'd already done this for my own uses a while ago so thought I could help save you the work of doing it!  Hopefully this doesn't duplicate any work you've already done.
      
      Here's a screenshot of what the UI looks like:
      ![image](https://cloud.githubusercontent.com/assets/1108612/6135352/c03e7276-b11c-11e4-8f11-c6aefe1f35b9.png)
      Based on a discussion with pwendell, I put the data read remotely in as an additional metric rather than showing it in brackets as you'd suggested, Kostas.  The assumption here is that the average user doesn't care about the differentiation between local / remote data, so it's better not to pollute the UI.
      
      I also added data about the local read time, which I've found very helpful for debugging, but I didn't put it in the UI because I think it's probably something not a ton of people will need to use.
      
      With this change, the total read time and total write time shown in the UI will be equal, fixing a long-term source of user confusion:
      ![image](https://cloud.githubusercontent.com/assets/1108612/6135399/25f14490-b11d-11e4-8086-20be5f4002e6.png)
      
      Author: Kay Ousterhout <kayousterhout@gmail.com>
      
      Closes #4510 from kayousterhout/SPARK-5645 and squashes the following commits:
      
      4a0182c [Kay Ousterhout] oops
      5f5da1b [Kay Ousterhout] Small style fix
      5da04cf [Kay Ousterhout] Addressed more comments from Kostas
      ba05149 [Kay Ousterhout] Remove parens
      a9dc685 [Kay Ousterhout] Kostas comment, test fix
      33d2e2d [Kay Ousterhout] Merge remote-tracking branch 'upstream/master' into SPARK-5645
      347e2cd [Kay Ousterhout] [SPARK-5645] Added local read bytes/time to task metrics
      893d6fd7
    • Michael Armbrust's avatar
      [SQL] Improve error messages · aa4ca8b8
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      Author: wangfei <wangfei1@huawei.com>
      
      Closes #4558 from marmbrus/errorMessages and squashes the following commits:
      
      5e5ab50 [Michael Armbrust] Merge pull request #15 from scwf/errorMessages
      fa38881 [wangfei] fix for grouping__id
      f279a71 [wangfei] make right references for ScriptTransformation
      d29fbde [Michael Armbrust] extra case
      1a797b4 [Michael Armbrust] comments
      d4e9015 [Michael Armbrust] add comment
      af9e668 [Michael Armbrust] no braces
      34eb3a4 [Michael Armbrust] more work
      6197cd5 [Michael Armbrust] [SQL] Better error messages for analysis failures
      aa4ca8b8
    • Antonio Navarro Perez's avatar
      [SQL][DOCS] Update sql documentation · 6a1be026
      Antonio Navarro Perez authored
      Updated examples using the new api and added DataFrame concept
      
      Author: Antonio Navarro Perez <ajnavarro@users.noreply.github.com>
      
      Closes #4560 from ajnavarro/ajnavarro-doc-sql-update and squashes the following commits:
      
      82ebcf3 [Antonio Navarro Perez] Changed a missing JavaSQLContext to SQLContext.
      8d5376a [Antonio Navarro Perez] fixed typo
      8196b6b [Antonio Navarro Perez] [SQL][DOCS] Update sql documentation
      6a1be026
    • Sean Owen's avatar
      SPARK-5776 JIRA version not of form x.y.z breaks merge_spark_pr.py · bc57789b
      Sean Owen authored
      Consider only x.y.z verisons from JIRA. CC JoshRosen who will probably know this script well.
      Alternative is to call the version "2.0.0" after all in JIRA.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #4570 from srowen/SPARK-5776 and squashes the following commits:
      
      fffafde [Sean Owen] Consider only x.y.z verisons from JIRA
      bc57789b
    • Xiangrui Meng's avatar
      [SPARK-5757][MLLIB] replace SQL JSON usage in model import/export by json4s · 99bd5006
      Xiangrui Meng authored
      This PR detaches MLlib model import/export code from SQL's JSON support, and hence unblocks #4544 . yhuai
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #4555 from mengxr/SPARK-5757 and squashes the following commits:
      
      b0415e8 [Xiangrui Meng] replace SQL JSON usage by json4s
      99bd5006
    • Andrew Rowson's avatar
      [SPARK-5655] Don't chmod700 application files if running in YARN · 466b1f67
      Andrew Rowson authored
      [Was previously PR4507]
      
      As per SPARK-5655, recently committed code chmod 700s all application files created on the local fs by a spark executor. This is both unnecessary and broken on YARN, where files created in the nodemanager's working directory are already owned by the user running the job and the 'yarn' group. Group read permission is also needed for the auxiliary shuffle service to be able to read the files, as this is running as the 'yarn' user.
      
      Author: Andrew Rowson <github@growse.com>
      
      Closes #4509 from growse/master and squashes the following commits:
      
      7ca993c [Andrew Rowson] Moved chmod700 functionality into Utils.getOrCreateLocalRootDirs
      f57ce6b [Andrew Rowson] [SPARK-5655] Don't chmod700 application files if running in a YARN container
      466b1f67
    • Oren Mazor's avatar
      ignore cache paths for RAT tests · 9a6efbcc
      Oren Mazor authored
      RAT fails on cache paths. add to .rat-excludes
      
      Author: Oren Mazor <oren.mazor@gmail.com>
      
      Closes #4569 from orenmazor/apache_master and squashes the following commits:
      
      d0c9e7e [Oren Mazor] ignore cache paths for RAT tests
      9a6efbcc
    • Sean Owen's avatar
      SPARK-5727 [BUILD] Remove Debian packaging · 9a3ea49f
      Sean Owen authored
      (for master / 1.4 only)
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #4526 from srowen/SPARK-5727.2 and squashes the following commits:
      
      83ba49c [Sean Owen] Remove Debian packaging
      9a3ea49f
  2. Feb 11, 2015
  3. Feb 10, 2015
    • Liang-Chi Hsieh's avatar
      [SPARK-5714][Mllib] Refactor initial step of LDA to remove redundant operations · f86a89a2
      Liang-Chi Hsieh authored
      The `initialState` of LDA performs several RDD operations that looks redundant. This pr tries to simplify these operations.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #4501 from viirya/sim_lda and squashes the following commits:
      
      4870fe4 [Liang-Chi Hsieh] For comments.
      9af1487 [Liang-Chi Hsieh] Refactor initial step of LDA to remove redundant operations.
      f86a89a2
    • Reynold Xin's avatar
      [SPARK-5702][SQL] Allow short names for built-in data sources. · b8f88d32
      Reynold Xin authored
      Also took the chance to fixed up some style ...
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #4489 from rxin/SPARK-5702 and squashes the following commits:
      
      74f42e3 [Reynold Xin] [SPARK-5702][SQL] Allow short names for built-in data sources.
      b8f88d32
    • Andrew Or's avatar
      [SPARK-5729] Potential NPE in standalone REST API · b9691826
      Andrew Or authored
      If the user specifies a bad REST URL, the server will throw an NPE instead of propagating the error back. This is because the default `ErrorServlet` has the wrong prefix. This is a one line fix. I am will add more comprehensive tests in a separate patch.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #4518 from andrewor14/rest-npe and squashes the following commits:
      
      16b15bc [Andrew Or] Correct ErrorServlet context prefix
      b9691826
    • mcheah's avatar
      [SPARK-4879] Use driver to coordinate Hadoop output committing for speculative tasks · 1cb37700
      mcheah authored
      Previously, SparkHadoopWriter always committed its tasks without question. The problem is that when speculation is enabled sometimes this can result in multiple tasks committing their output to the same file. Even though an HDFS-writing task may be re-launched due to speculation, the original task is not killed and may eventually commit as well.
      
      This can cause strange race conditions where multiple tasks that commit interfere with each other, with the result being that some partition files are actually lost entirely. For more context on these kinds of scenarios, see SPARK-4879.
      
      In Hadoop MapReduce jobs, the application master is a central coordinator that authorizes whether or not any given task can commit. Before a task commits its output, it queries the application master as to whether or not such a commit is safe, and the application master does bookkeeping as tasks are requesting commits. Duplicate tasks that would write to files that were already written to from other tasks are prohibited from committing.
      
      This patch emulates that functionality - the crucial missing component was a central arbitrator, which is now a module called the OutputCommitCoordinator. The coordinator lives on the driver and the executors can obtain a reference to this actor and request its permission to commit. As tasks commit and are reported as completed successfully or unsuccessfully by the DAGScheduler, the commit coordinator is informed of the task completion events as well to update its internal state.
      
      Future work includes more rigorous unit testing and extra optimizations should this patch cause a performance regression. It is unclear what the overall cost of communicating back to the driver on every hadoop-committing task will be. It's also important for those hitting this issue to backport this onto previous version of Spark because the bug has serious consequences, that is, data is lost.
      
      Currently, the OutputCommitCoordinator is only used when `spark.speculation` is true.  It can be disabled by setting `spark.hadoop.outputCommitCoordination.enabled=false` in SparkConf.
      
      This patch is an updated version of #4155 (by mccheah), which in turn was an updated version of this PR.
      
      Closes #4155.
      
      Author: mcheah <mcheah@palantir.com>
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #4066 from JoshRosen/SPARK-4879-sparkhadoopwriter-fix and squashes the following commits:
      
      658116b [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-4879-sparkhadoopwriter-fix
      ed783b2 [Josh Rosen] Address Andrew’s feedback.
      e7be65a [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-4879-sparkhadoopwriter-fix
      14861ea [Josh Rosen] splitID -> partitionID in a few places
      ed8b554 [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-4879-sparkhadoopwriter-fix
      48d5c1c [Josh Rosen] Roll back copiesRunning change in TaskSetManager
      3969f5f [Josh Rosen] Re-enable guarding of commit coordination with spark.speculation setting.
      ede7590 [Josh Rosen] Add test to ensure that a job that denies all commits cannot complete successfully.
      97da5fe [Josh Rosen] Use actor only for RPC; call methods directly in DAGScheduler.
      f582574 [Josh Rosen] Some cleanup in OutputCommitCoordinatorSuite
      a7c0e29 [Josh Rosen] Create fake TaskInfo using dummy fields instead of Mockito.
      997b41b [Josh Rosen] Roll back unnecessary DAGSchedulerSingleThreadedProcessLoop refactoring:
      459310a [Josh Rosen] Roll back TaskSetManager changes that broke other tests.
      dd00b7c [Josh Rosen] Move CommitDeniedException to executors package; remove `@DeveloperAPI` annotation.
      c79df98 [Josh Rosen] Some misc. code style + doc changes:
      f7d69c5 [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-4879-sparkhadoopwriter-fix
      92e6dc9 [Josh Rosen] Bug fix: use task ID instead of StageID to index into authorizedCommitters.
      b344bad [Josh Rosen] (Temporarily) re-enable “always coordinate” for testing purposes.
      0aec91e [Josh Rosen] Only coordinate when speculation is enabled; add configuration option to bypass new coordination.
      594e41a [mcheah] Fixing a scalastyle error
      60a47f4 [mcheah] Writing proper unit test for OutputCommitCoordinator and fixing bugs.
      d63f63f [mcheah] Fixing compiler error
      9fe6495 [mcheah] Fixing scalastyle
      1df2a91 [mcheah] Throwing exception if SparkHadoopWriter commit denied
      d431144 [mcheah] Using more concurrency to process OutputCommitCoordinator requests.
      c334255 [mcheah] Properly handling messages that could be sent after actor shutdown.
      8d5a091 [mcheah] Was mistakenly serializing the accumulator in test suite.
      9c6a4fa [mcheah] More OutputCommitCoordinator cleanup on stop()
      78eb1b5 [mcheah] Better OutputCommitCoordinatorActor stopping; simpler canCommit
      83de900 [mcheah] Making the OutputCommitCoordinatorMessage serializable
      abc7db4 [mcheah] TaskInfo can't be null in DAGSchedulerSuite
      f135a8e [mcheah] Moving the output commit coordinator from class into method.
      1c2b219 [mcheah] Renaming oudated names for test function classes
      66a71cd [mcheah] Removing whitespace modifications
      6b543ba [mcheah] Removing redundant accumulator in unit test
      c9decc6 [mcheah] Scalastyle fixes
      bc80770 [mcheah] Unit tests for OutputCommitCoordinator
      6e6f748 [mcheah] [SPARK-4879] Use the Spark driver to authorize Hadoop commits.
      1cb37700
    • Reynold Xin's avatar
      [SQL][DataFrame] Fix column computability bug. · 7e24249a
      Reynold Xin authored
      Do not recursively strip out projects. Only strip the first level project.
      
      ```scala
      df("colA") + df("colB").as("colC")
      ```
      
      Previously, the above would construct an invalid plan.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #4519 from rxin/computability and squashes the following commits:
      
      87ff763 [Reynold Xin] Code review feedback.
      015c4fc [Reynold Xin] [SQL][DataFrame] Fix column computability.
      7e24249a
    • Cheng Hao's avatar
      [SPARK-5709] [SQL] Add EXPLAIN support in DataFrame API for debugging purpose · 45df77b8
      Cheng Hao authored
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #4496 from chenghao-intel/df_explain and squashes the following commits:
      
      552aa58 [Cheng Hao] Add explain support for DF
      45df77b8
    • Davies Liu's avatar
      [SPARK-5704] [SQL] [PySpark] createDataFrame from RDD with columns · ea602840
      Davies Liu authored
      Deprecate inferSchema() and applySchema(), use createDataFrame() instead, which could take an optional `schema` to create an DataFrame from an RDD. The `schema` could be StructType or list of names of columns.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4498 from davies/create and squashes the following commits:
      
      08469c1 [Davies Liu] remove Scala/Java API for now
      c80a7a9 [Davies Liu] fix hive test
      d1bd8f2 [Davies Liu] cleanup applySchema
      9526e97 [Davies Liu] createDataFrame from RDD with columns
      ea602840
    • Cheng Hao's avatar
      [SPARK-5683] [SQL] Avoid multiple json generator created · a60aea86
      Cheng Hao authored
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #4468 from chenghao-intel/json and squashes the following commits:
      
      aeb7801 [Cheng Hao] avoid multiple json generator created
      a60aea86
    • Michael Armbrust's avatar
      [SQL] Add an exception for analysis errors. · 6195e247
      Michael Armbrust authored
      Also start from the bottom so we show the first error instead of the top error.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #4439 from marmbrus/analysisException and squashes the following commits:
      
      45862a0 [Michael Armbrust] fix hive test
      a773bba [Michael Armbrust] Merge remote-tracking branch 'origin/master' into analysisException
      f88079f [Michael Armbrust] update more cases
      fede90a [Michael Armbrust] newline
      fbf4bc3 [Michael Armbrust] move to sql
      6235db4 [Michael Armbrust] [SQL] Add an exception for analysis errors.
      6195e247
    • Yin Huai's avatar
      [SPARK-5658][SQL] Finalize DDL and write support APIs · aaf50d05
      Yin Huai authored
      https://issues.apache.org/jira/browse/SPARK-5658
      
      Author: Yin Huai <yhuai@databricks.com>
      
      This patch had conflicts when merged, resolved by
      Committer: Michael Armbrust <michael@databricks.com>
      
      Closes #4446 from yhuai/writeSupportFollowup and squashes the following commits:
      
      f3a96f7 [Yin Huai] davies's comments.
      225ff71 [Yin Huai] Use Scala TestHiveContext to initialize the Python HiveContext in Python tests.
      2306f93 [Yin Huai] Style.
      2091fcd [Yin Huai] Merge remote-tracking branch 'upstream/master' into writeSupportFollowup
      537e28f [Yin Huai] Correctly clean up temp data.
      ae4649e [Yin Huai] Fix Python test.
      609129c [Yin Huai] Doc format.
      92b6659 [Yin Huai] Python doc and other minor updates.
      cbc717f [Yin Huai] Rename dataSourceName to source.
      d1c12d3 [Yin Huai] No need to delete the duplicate rule since it has been removed in master.
      22cfa70 [Yin Huai] Merge remote-tracking branch 'upstream/master' into writeSupportFollowup
      d91ecb8 [Yin Huai] Fix test.
      4c76d78 [Yin Huai] Simplify APIs.
      3abc215 [Yin Huai] Merge remote-tracking branch 'upstream/master' into writeSupportFollowup
      0832ce4 [Yin Huai] Fix test.
      98e7cdb [Yin Huai] Python style.
      2bf44ef [Yin Huai] Python APIs.
      c204967 [Yin Huai] Format
      a10223d [Yin Huai] Merge remote-tracking branch 'upstream/master' into writeSupportFollowup
      9ff97d8 [Yin Huai] Add SaveMode to saveAsTable.
      9b6e570 [Yin Huai] Update doc.
      c2be775 [Yin Huai] Merge remote-tracking branch 'upstream/master' into writeSupportFollowup
      99950a2 [Yin Huai] Use Java enum for SaveMode.
      4679665 [Yin Huai] Remove duplicate rule.
      77d89dc [Yin Huai] Update doc.
      e04d908 [Yin Huai] Move import and add (Scala-specific) to scala APIs.
      cf5703d [Yin Huai] Add checkAnswer to Java tests.
      7db95ff [Yin Huai] Merge remote-tracking branch 'upstream/master' into writeSupportFollowup
      6dfd386 [Yin Huai] Add java test.
      f2f33ef [Yin Huai] Fix test.
      e702386 [Yin Huai] Apache header.
      b1e9b1b [Yin Huai] Format.
      ed4e1b4 [Yin Huai] Merge remote-tracking branch 'upstream/master' into writeSupportFollowup
      af9e9b3 [Yin Huai] DDL and write support API followup.
      2a6213a [Yin Huai] Update API names.
      e6a0b77 [Yin Huai] Update test.
      43bae01 [Yin Huai] Remove createTable from HiveContext.
      5ffc372 [Yin Huai] Add more load APIs to SQLContext.
      5390743 [Yin Huai] Add more save APIs to DataFrame.
      aaf50d05
    • Marcelo Vanzin's avatar
      [SPARK-5493] [core] Add option to impersonate user. · ed167e70
      Marcelo Vanzin authored
      Hadoop has a feature that allows users to impersonate other users
      when submitting applications or talking to HDFS, for example. These
      impersonated users are referred generally as "proxy users".
      
      Services such as Oozie or Hive use this feature to run applications
      as the requesting user.
      
      This change makes SparkSubmit accept a new command line option to
      run the application as a proxy user. It also fixes the plumbing
      of the user name through the UI (and a couple of other places) to
      refer to the correct user running the application, which can be
      different than `sys.props("user.name")` even without proxies (e.g.
      when using kerberos).
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #4405 from vanzin/SPARK-5493 and squashes the following commits:
      
      df82427 [Marcelo Vanzin] Clarify the reason for the special exception handling.
      05bfc08 [Marcelo Vanzin] Remove unneeded annotation.
      4840de9 [Marcelo Vanzin] Review feedback.
      8af06ff [Marcelo Vanzin] Fix usage string.
      2e4fa8f [Marcelo Vanzin] Merge branch 'master' into SPARK-5493
      b6c947d [Marcelo Vanzin] Merge branch 'master' into SPARK-5493
      0540d38 [Marcelo Vanzin] [SPARK-5493] [core] Add option to impersonate user.
      ed167e70
    • Yin Huai's avatar
      [SQL] Make Options in the data source API CREATE TABLE statements optional. · e28b6bdb
      Yin Huai authored
      Users will not need to put `Options()` in a CREATE TABLE statement when there is not option provided.
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #4515 from yhuai/makeOptionsOptional and squashes the following commits:
      
      1a898d3 [Yin Huai] Make options optional.
      e28b6bdb
    • Cheng Lian's avatar
      [SPARK-5725] [SQL] Fixes ParquetRelation2.equals · 2d50a010
      Cheng Lian authored
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4513)
      <!-- Reviewable:end -->
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #4513 from liancheng/spark-5725 and squashes the following commits:
      
      bf6a087 [Cheng Lian] Fixes ParquetRelation2.equals
      2d50a010
    • Sheng, Li's avatar
      [SQL][Minor] correct some comments · 91e35125
      Sheng, Li authored
      Author: Sheng, Li <OopsOutOfMemory@users.noreply.github.com>
      Author: OopsOutOfMemory <victorshengli@126.com>
      
      Closes #4508 from OopsOutOfMemory/cmt and squashes the following commits:
      
      d8a68c6 [Sheng, Li] Update ddl.scala
      f24aeaf [OopsOutOfMemory] correct style
      91e35125
    • Sephiroth-Lin's avatar
      [SPARK-5644] [Core]Delete tmp dir when sc is stop · 52983d7f
      Sephiroth-Lin authored
      When we run driver as a service, and for each time we run job we only call sc.stop, then will not delete tmp dir create by HttpFileServer and SparkEnv, it will be deleted until the service process exit, so we need to delete these tmp dirs when sc is stop directly.
      
      Author: Sephiroth-Lin <linwzhong@gmail.com>
      
      Closes #4412 from Sephiroth-Lin/bug-fix-master-01 and squashes the following commits:
      
      fbbc785 [Sephiroth-Lin] using an interpolated string
      b968e14 [Sephiroth-Lin] using an interpolated string
      4edf394 [Sephiroth-Lin] rename the variable and update comment
      1339c96 [Sephiroth-Lin] add a member to store the reference of tmp dir
      b2018a5 [Sephiroth-Lin] check sparkFilesDir before delete
      f48a3c6 [Sephiroth-Lin] don't check sparkFilesDir, check executorId
      dd9686e [Sephiroth-Lin] format code
      b38e0f0 [Sephiroth-Lin] add dir check before delete
      d7ccc64 [Sephiroth-Lin] Change log level
      1d70926 [Sephiroth-Lin] update comment
      e2a2b1b [Sephiroth-Lin] update comment
      aeac518 [Sephiroth-Lin] Delete tmp dir when sc is stop
      c0d5b28 [Sephiroth-Lin] Delete tmp dir when sc is stop
      52983d7f
    • Brennon York's avatar
      [SPARK-5343][GraphX]: ShortestPaths traverses backwards · 58209612
      Brennon York authored
      Corrected the logic with ShortestPaths so that the calculation will run forward rather than backwards. Output before looked like:
      
      ```scala
      import org.apache.spark.graphx._
      val g = Graph(sc.makeRDD(Array((1L,""), (2L,""), (3L,""))), sc.makeRDD(Array(Edge(1L,2L,""), Edge(2L,3L,""))))
      lib.ShortestPaths.run(g,Array(3)).vertices.collect
      // res0: Array[(org.apache.spark.graphx.VertexId, org.apache.spark.graphx.lib.ShortestPaths.SPMap)] = Array((1,Map()), (3,Map(3 -> 0)), (2,Map()))
      lib.ShortestPaths.run(g,Array(1)).vertices.collect
      // res1: Array[(org.apache.spark.graphx.VertexId, org.apache.spark.graphx.lib.ShortestPaths.SPMap)] = Array((1,Map(1 -> 0)), (3,Map(1 -> 2)), (2,Map(1 -> 1)))
      ```
      
      And new output after the changes looks like:
      
      ```scala
      import org.apache.spark.graphx._
      val g = Graph(sc.makeRDD(Array((1L,""), (2L,""), (3L,""))), sc.makeRDD(Array(Edge(1L,2L,""), Edge(2L,3L,""))))
      lib.ShortestPaths.run(g,Array(3)).vertices.collect
      // res0: Array[(org.apache.spark.graphx.VertexId, org.apache.spark.graphx.lib.ShortestPaths.SPMap)] = Array((1,Map(3 -> 2)), (2,Map(3 -> 1)), (3,Map(3 -> 0)))
      lib.ShortestPaths.run(g,Array(1)).vertices.collect
      // res1: Array[(org.apache.spark.graphx.VertexId, org.apache.spark.graphx.lib.ShortestPaths.SPMap)] = Array((1,Map(1 -> 0)), (2,Map()), (3,Map()))
      ```
      
      Author: Brennon York <brennon.york@capitalone.com>
      
      Closes #4478 from brennonyork/SPARK-5343 and squashes the following commits:
      
      aa57f83 [Brennon York] updated to set ShortestPaths to run 'forward' rather than 'backward'
      58209612
    • MechCoder's avatar
      [SPARK-5021] [MLlib] Gaussian Mixture now supports Sparse Input · fd2c032f
      MechCoder authored
      Following discussion in the Jira.
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #4459 from MechCoder/sparse_gmm and squashes the following commits:
      
      1b18dab [MechCoder] Rewrite syr for sparse matrices
      e579041 [MechCoder] Add test for covariance matrix
      5cb370b [MechCoder] Separate tests for sparse data
      5e096bd [MechCoder] Alphabetize and correct error message
      e180f4c [MechCoder] [SPARK-5021] Gaussian Mixture now supports Sparse Input
      fd2c032f
Loading