Skip to content
Snippets Groups Projects
  1. Feb 19, 2015
    • Zhan Zhang's avatar
      [Spark-5889] Remove pid file after stopping service. · ad6b169d
      Zhan Zhang authored
      Currently the pid file is not deleted, and potentially may cause some problem after service is stopped. The fix remove the pid file after service stopped.
      
      Author: Zhan Zhang <zhazhan@gmail.com>
      
      Closes #4676 from zhzhan/spark-5889 and squashes the following commits:
      
      eb01be1 [Zhan Zhang] solve review comments
      b4c009e [Zhan Zhang] solve review comments
      018110a [Zhan Zhang] spark-5889: remove pid file after stopping service
      088d2a2 [Zhan Zhang] squash all commits
      c1f1fa5 [Zhan Zhang] test
      ad6b169d
    • Joseph K. Bradley's avatar
      [SPARK-5902] [ml] Made PipelineStage.transformSchema public instead of private to ml · a5fed343
      Joseph K. Bradley authored
      For users to implement their own PipelineStages, we need to make PipelineStage.transformSchema be public instead of private to ml.  This would be nice to include in Spark 1.3
      
      CC: mengxr
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #4682 from jkbradley/SPARK-5902 and squashes the following commits:
      
      6f02357 [Joseph K. Bradley] Made transformSchema public
      0e6d0a0 [Joseph K. Bradley] made implementations of transformSchema protected as well
      fdaf26a [Joseph K. Bradley] Made PipelineStage.transformSchema protected instead of private[ml]
      a5fed343
    • Reynold Xin's avatar
      [SPARK-5904][SQL] DataFrame API fixes. · 8ca3418e
      Reynold Xin authored
      1. Column is no longer a DataFrame to simplify class hierarchy.
      2. Don't use varargs on abstract methods (see Scala compiler bug SI-9013).
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #4686 from rxin/SPARK-5904 and squashes the following commits:
      
      fd9b199 [Reynold Xin] Fixed Python tests.
      df25cef [Reynold Xin] Non final.
      5221530 [Reynold Xin] [SPARK-5904][SQL] DataFrame API fixes.
      8ca3418e
    • Cheng Hao's avatar
      [SPARK-5825] [Spark Submit] Remove the double checking instance name when stopping the service · 94cdb05f
      Cheng Hao authored
      `spark-daemon.sh` will confirm the process id by fuzzy matching the class name while stopping the service, however, it will fail if the java process arguments is very long (greater than 4096 characters).
      This PR looses the check for the service process.
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #4611 from chenghao-intel/stopping_service and squashes the following commits:
      
      a0051f6 [Cheng Hao] loosen the process checking while stopping a service
      94cdb05f
    • zsxwing's avatar
      [SPARK-5423][Core] Cleanup resources in DiskMapIterator.finalize to ensure deleting the temp file · 90095bf3
      zsxwing authored
      This PR adds a `finalize` method in DiskMapIterator to clean up the resources even if some exception happens during processing data.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #4219 from zsxwing/SPARK-5423 and squashes the following commits:
      
      d4b2ca6 [zsxwing] Cleanup resources in DiskMapIterator.finalize to ensure deleting the temp file
      90095bf3
    • Andrew Or's avatar
      [SPARK-5816] Add huge compatibility warning in DriverWrapper · 38e624a7
      Andrew Or authored
      The stability of the new submission gateway assumes that the arguments in `DriverWrapper` are consistent across multiple Spark versions. However, this is not at all clear from the code itself. In fact, this was broken in 20a60131, which is fortunately OK because both that commit and the original commit that added this gateway are part of the same release.
      
      To prevent this from happening again we should at the very least add a huge warning where appropriate.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #4687 from andrewor14/driver-wrapper-warning and squashes the following commits:
      
      7989b56 [Andrew Or] Add huge compatibility warning
      38e624a7
    • Jacek Lewandowski's avatar
      SPARK-5548: Fix for AkkaUtilsSuite failure - attempt 2 · fb87f449
      Jacek Lewandowski authored
      Author: Jacek Lewandowski <lewandowski.jacek@gmail.com>
      
      Closes #4653 from jacek-lewandowski/SPARK-5548-2-master and squashes the following commits:
      
      0e199b6 [Jacek Lewandowski] SPARK-5548: applied reviewer's comments
      843eafb [Jacek Lewandowski] SPARK-5548: Fix for AkkaUtilsSuite failure - attempt 2
      fb87f449
  2. Feb 18, 2015
    • Kay Ousterhout's avatar
      [SPARK-5846] Correctly set job description and pool for SQL jobs · e945aa61
      Kay Ousterhout authored
      marmbrus am I missing something obvious here? I verified that this fixes the problem for me (on 1.2.1) on EC2, but I'm confused about how others wouldn't have noticed this?
      
      Author: Kay Ousterhout <kayousterhout@gmail.com>
      
      Closes #4630 from kayousterhout/SPARK-5846_1.3 and squashes the following commits:
      
      2022ad4 [Kay Ousterhout] [SPARK-5846] Correctly set job description and pool for SQL jobs
      e945aa61
    • Xiangrui Meng's avatar
      [SPARK-5879][MLLIB] update PIC user guide and add a Java example · d12d2ad7
      Xiangrui Meng authored
      Updated PIC user guide to reflect API changes and added a simple Java example. The API is still not very Java-friendly. I created SPARK-5990 for this issue.
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #4680 from mengxr/SPARK-5897 and squashes the following commits:
      
      847d216 [Xiangrui Meng] apache header
      87719a2 [Xiangrui Meng] remove PIC image
      2dd921f [Xiangrui Meng] update PIC user guide and add a Java example
      d12d2ad7
    • Davies Liu's avatar
      [SPARK-5722] [SQL] [PySpark] infer int as LongType · aa8f10e8
      Davies Liu authored
      The `int` is 64-bit on 64-bit machine (very common now), we should infer it as LongType for it in Spark SQL.
      
      Also, LongType in SQL will come back as `int`.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4666 from davies/long and squashes the following commits:
      
      6bc6cc4 [Davies Liu] infer int as LongType
      aa8f10e8
    • Reynold Xin's avatar
      [SPARK-5840][SQL] HiveContext cannot be serialized due to tuple extraction · f0e3b710
      Reynold Xin authored
      Also added test cases for checking the serializability of HiveContext and SQLContext.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #4628 from rxin/SPARK-5840 and squashes the following commits:
      
      ecb3bcd [Reynold Xin] test cases and reviews.
      55eb822 [Reynold Xin] [SPARK-5840][SQL] HiveContext cannot be serialized due to tuple extraction.
      f0e3b710
    • Burak Yavuz's avatar
      [SPARK-5507] Added documentation for BlockMatrix · a8eb92dc
      Burak Yavuz authored
      Docs for BlockMatrix. mengxr
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      
      Closes #4664 from brkyvz/SPARK-5507PR and squashes the following commits:
      
      4db30b0 [Burak Yavuz] [SPARK-5507] Added documentation for BlockMatrix
      a8eb92dc
    • Xiangrui Meng's avatar
      [SPARK-5519][MLLIB] add user guide with example code for fp-growth · 85e9d091
      Xiangrui Meng authored
      The API is still not very Java-friendly because `Array[Item]` in `freqItemsets` is recognized as `Object` in Java. We might want to define a case class to wrap the return pair to make it Java friendly.
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #4661 from mengxr/SPARK-5519 and squashes the following commits:
      
      58ccc25 [Xiangrui Meng] add user guide with example code for fp-growth
      85e9d091
    • Sean Owen's avatar
      SPARK-5669 [BUILD] [HOTFIX] Spark assembly includes incompatibly licensed... · 5aecdcf1
      Sean Owen authored
      SPARK-5669 [BUILD] [HOTFIX] Spark assembly includes incompatibly licensed libgfortran, libgcc code via JBLAS
      
      Correct exclusion path for JBLAS native libs.
      (More explanation coming soon on the mailing list re: 1.3.0 RC1)
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #4673 from srowen/SPARK-5669.2 and squashes the following commits:
      
      e29693c [Sean Owen] Correct exclusion path for JBLAS native libs
      5aecdcf1
    • Kousuke Saruta's avatar
      [SPARK-4949]shutdownCallback in SparkDeploySchedulerBackend should be enclosed... · 82197ed3
      Kousuke Saruta authored
      [SPARK-4949]shutdownCallback in SparkDeploySchedulerBackend should be enclosed by synchronized block.
      
      A variable `shutdownCallback` in SparkDeploySchedulerBackend can be accessed from multiple threads so it should be enclosed by synchronized block.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #3781 from sarutak/SPARK-4949 and squashes the following commits:
      
      c146c93 [Kousuke Saruta] Removed "setShutdownCallback" method
      c7265dc [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-4949
      42ca528 [Kousuke Saruta] Changed the declaration of the variable "shutdownCallback" as a volatile reference instead of AtomicReference
      552df7c [Kousuke Saruta] Changed the declaration of the variable "shutdownCallback" as a volatile reference instead of AtomicReference
      f556819 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-4949
      1b60fd1 [Kousuke Saruta] Improved the locking logics
      5942765 [Kousuke Saruta] Enclosed shutdownCallback in SparkDeploySchedulerBackend by synchronized block
      82197ed3
    • MechCoder's avatar
      SPARK-4610 addendum: [Minor] [MLlib] Minor doc fix in GBT classification example · e79a7a62
      MechCoder authored
      numClassesForClassification has been renamed to numClasses.
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #4672 from MechCoder/minor-doc and squashes the following commits:
      
      d2ddb7f [MechCoder] Minor doc fix in GBT classification example
      e79a7a62
    • Davies Liu's avatar
      [SPARK-5878] fix DataFrame.repartition() in Python · c1b6fa98
      Davies Liu authored
      Also add tests for distinct()
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4667 from davies/repartition and squashes the following commits:
      
      79059fd [Davies Liu] add test
      cb4915e [Davies Liu] fix repartition
      c1b6fa98
    • Tor Myklebust's avatar
      Avoid deprecation warnings in JDBCSuite. · de0dd6de
      Tor Myklebust authored
      This pull request replaces calls to deprecated methods from `java.util.Date` with near-equivalents in `java.util.Calendar`.
      
      Author: Tor Myklebust <tmyklebu@gmail.com>
      
      Closes #4668 from tmyklebu/master and squashes the following commits:
      
      66215b1 [Tor Myklebust] Use GregorianCalendar instead of Timestamp get methods.
      de0dd6de
    • Cheng Lian's avatar
      [Minor] [SQL] Cleans up DataFrame variable names and toDF() calls · 61ab0854
      Cheng Lian authored
      Although we've migrated to the DataFrame API, lots of code still uses `rdd` or `srdd` as local variable names. This PR tries to address these naming inconsistencies and some other minor DataFrame related style issues.
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4670)
      <!-- Reviewable:end -->
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #4670 from liancheng/df-cleanup and squashes the following commits:
      
      3e14448 [Cheng Lian] Cleans up DataFrame variable names and toDF() calls
      61ab0854
    • Tathagata Das's avatar
      [SPARK-5731][Streaming][Test] Fix incorrect test in DirectKafkaStreamSuite · 3912d332
      Tathagata Das authored
      The test was incorrect. Instead of counting the number of records, it counted the number of partitions of RDD generated by DStream. Which is not its intention. I will be testing this patch multiple times to understand its flakiness.
      
      PS: This was caused by my refactoring in https://github.com/apache/spark/pull/4384/
      
      koeninger check it out.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #4597 from tdas/kafka-flaky-test and squashes the following commits:
      
      d236235 [Tathagata Das] Unignored last test.
      e9a1820 [Tathagata Das] fix test
      3912d332
  3. Feb 17, 2015
    • Yin Huai's avatar
      [SPARK-5723][SQL]Change the default file format to Parquet for CTAS statements. · e50934f1
      Yin Huai authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-5723
      
      Author: Yin Huai <yhuai@databricks.com>
      
      This patch had conflicts when merged, resolved by
      Committer: Michael Armbrust <michael@databricks.com>
      
      Closes #4639 from yhuai/defaultCTASFileFormat and squashes the following commits:
      
      a568137 [Yin Huai] Merge remote-tracking branch 'upstream/master' into defaultCTASFileFormat
      ad2b07d [Yin Huai] Update tests and error messages.
      8af5b2a [Yin Huai] Update conf key and unit test.
      5a67903 [Yin Huai] Use data source write path for Hive's CTAS statements when no storage format/handler is specified.
      e50934f1
    • Yin Huai's avatar
      [SPARK-5875][SQL]logical.Project should not be resolved if it contains aggregates or generators · d5f12bfe
      Yin Huai authored
      https://issues.apache.org/jira/browse/SPARK-5875 has a case to reproduce the bug and explain the root cause.
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #4663 from yhuai/projectResolved and squashes the following commits:
      
      472f7b6 [Yin Huai] If a logical.Project has any AggregateExpression or Generator, it's resolved field should be false.
      d5f12bfe
    • Josh Rosen's avatar
      [SPARK-4454] Revert getOrElse() cleanup in DAGScheduler.getCacheLocs() · a51fc7ef
      Josh Rosen authored
      This method is performance-sensitive and this change wasn't necessary.
      a51fc7ef
    • Josh Rosen's avatar
      [SPARK-4454] Properly synchronize accesses to DAGScheduler cacheLocs map · d46d6246
      Josh Rosen authored
      This patch addresses a race condition in DAGScheduler by properly synchronizing accesses to its `cacheLocs` map.
      
      This map is accessed by the `getCacheLocs` and `clearCacheLocs()` methods, which can be called by separate threads, since DAGScheduler's `getPreferredLocs()` method is called by SparkContext and indirectly calls `getCacheLocs()`.  If this map is cleared by the DAGScheduler event processing thread while a user thread is submitting a job and computing preferred locations, then this can cause the user thread to throw "NoSuchElementException: key not found" errors.
      
      Most accesses to DAGScheduler's internal state do not need synchronization because that state is only accessed from the event processing loop's thread.  An alternative approach to fixing this bug would be to refactor this code so that SparkContext sends the DAGScheduler a message in order to get the list of preferred locations.  However, this would involve more extensive changes to this code and would be significantly harder to backport to maintenance branches since some of the related code has undergone significant refactoring (e.g. the introduction of EventLoop).  Since `cacheLocs` is the only state that's accessed in this way, adding simple synchronization seems like a better short-term fix.
      
      See #3345 for additional context.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #4660 from JoshRosen/SPARK-4454 and squashes the following commits:
      
      12d64ba [Josh Rosen] Properly synchronize accesses to DAGScheduler cacheLocs map.
      d46d6246
    • Burak Yavuz's avatar
      [SPARK-5811] Added documentation for maven coordinates and added Spark Packages support · ae6cfb3a
      Burak Yavuz authored
      Documentation for maven coordinates + Spark Package support. Added pyspark tests for `--packages`
      
      Author: Burak Yavuz <brkyvz@gmail.com>
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4662 from brkyvz/SPARK-5811 and squashes the following commits:
      
      56ccccd [Burak Yavuz] fixed broken test
      64cb8ee [Burak Yavuz] passed pep8 on local
      c07b81e [Burak Yavuz] fixed pep8
      a8bd6b7 [Burak Yavuz] submit PR
      4ef4046 [Burak Yavuz] ready for PR
      8fb02e5 [Burak Yavuz] merged master
      25c9b9f [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into python-jar
      560d13b [Burak Yavuz] before PR
      17d3f76 [Davies Liu] support .jar as python package
      a3eb717 [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into SPARK-5811
      c60156d [Burak Yavuz] [SPARK-5811] Added documentation for maven coordinates
      ae6cfb3a
    • Davies Liu's avatar
      [SPARK-5785] [PySpark] narrow dependency for cogroup/join in PySpark · c3d2b90b
      Davies Liu authored
      Currently, PySpark does not support narrow dependency during cogroup/join when the two RDDs have the partitioner, another unnecessary shuffle stage will come in.
      
      The Python implementation of cogroup/join is different than Scala one, it depends on union() and partitionBy(). This patch will try to use PartitionerAwareUnionRDD() in union(), when all the RDDs have the same partitioner. It also fix `reservePartitioner` in all the map() or mapPartitions(), then partitionBy() can skip the unnecessary shuffle stage.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4629 from davies/narrow and squashes the following commits:
      
      dffe34e [Davies Liu] improve test, check number of stages for join/cogroup
      1ed3ba2 [Davies Liu] Merge branch 'master' of github.com:apache/spark into narrow
      4d29932 [Davies Liu] address comment
      cc28d97 [Davies Liu] add unit tests
      940245e [Davies Liu] address comments
      ff5a0a6 [Davies Liu] skip the partitionBy() on Python side
      eb26c62 [Davies Liu] narrow dependency in PySpark
      c3d2b90b
    • Yin Huai's avatar
      [SPARK-5852][SQL]Fail to convert a newly created empty metastore parquet table... · 117121a4
      Yin Huai authored
      [SPARK-5852][SQL]Fail to convert a newly created empty metastore parquet table to a data source parquet table.
      
      The problem is that after we create an empty hive metastore parquet table (e.g. `CREATE TABLE test (a int) STORED AS PARQUET`), Hive will create an empty dir for us, which cause our data source `ParquetRelation2` fail to get the schema of the table. See JIRA for the case to reproduce the bug and the exception.
      
      This PR is based on #4562 from chenghao-intel.
      
      JIRA: https://issues.apache.org/jira/browse/SPARK-5852
      
      Author: Yin Huai <yhuai@databricks.com>
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #4655 from yhuai/CTASParquet and squashes the following commits:
      
      b8b3450 [Yin Huai] Update tests.
      2ac94f7 [Yin Huai] Update tests.
      3db3d20 [Yin Huai] Minor update.
      d7e2308 [Yin Huai] Revert changes in HiveMetastoreCatalog.scala.
      36978d1 [Cheng Hao] Update the code as feedback
      a04930b [Cheng Hao] fix bug of scan an empty parquet based table
      442ffe0 [Cheng Hao] passdown the schema for Parquet File in HiveContext
      117121a4
    • Davies Liu's avatar
      [SPARK-5872] [SQL] create a sqlCtx in pyspark shell · 4d4cc760
      Davies Liu authored
      The sqlCtx will be HiveContext if hive is built in assembly jar, or SQLContext if not.
      
      It also skip the Hive tests in pyspark.sql.tests if no hive is available.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4659 from davies/sqlctx and squashes the following commits:
      
      0e6629a [Davies Liu] sqlCtx in pyspark
      4d4cc760
    • Davies Liu's avatar
      [SPARK-5871] output explain in Python · 3df85dcc
      Davies Liu authored
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4658 from davies/explain and squashes the following commits:
      
      db87ea2 [Davies Liu] output explain in Python
      3df85dcc
    • Davies Liu's avatar
      [SPARK-4172] [PySpark] Progress API in Python · 445a755b
      Davies Liu authored
      This patch bring the pull based progress API into Python, also a example in Python.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #3027 from davies/progress_api and squashes the following commits:
      
      b1ba984 [Davies Liu] fix style
      d3b9253 [Davies Liu] add tests, mute the exception after stop
      4297327 [Davies Liu] Merge branch 'master' of github.com:apache/spark into progress_api
      969fa9d [Davies Liu] Merge branch 'master' of github.com:apache/spark into progress_api
      25590c9 [Davies Liu] update with Java API
      360de2d [Davies Liu] Merge branch 'master' of github.com:apache/spark into progress_api
      c0f1021 [Davies Liu] Merge branch 'master' of github.com:apache/spark into progress_api
      023afb3 [Davies Liu] add Python API and example for progress API
      445a755b
    • Michael Armbrust's avatar
      [SPARK-5868][SQL] Fix python UDFs in HiveContext and checks in SQLContext · de4836f8
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #4657 from marmbrus/pythonUdfs and squashes the following commits:
      
      a7823a8 [Michael Armbrust] [SPARK-5868][SQL] Fix python UDFs in HiveContext and checks in SQLContext
      de4836f8
    • Cheng Hao's avatar
      [SQL] [Minor] Update the HiveContext Unittest · 9d281fa5
      Cheng Hao authored
      In unit test, the table src(key INT, value STRING) is not the same as HIVE src(key STRING, value STRING)
      https://github.com/apache/hive/blob/branch-0.13/data/scripts/q_test_init.sql
      
      And in the reflect.q, test failed for expression `reflect("java.lang.Integer", "valueOf", key, 16)`, which expect the argument `key` as STRING not INT.
      
      This PR doesn't aim to change the `src` schema, we can do that after 1.3 released, however, we probably need to re-generate all the golden files.
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #4584 from chenghao-intel/reflect and squashes the following commits:
      
      e5bdc3a [Cheng Hao] Move the test case reflect into blacklist
      184abfd [Cheng Hao] revert the change to table src1
      d9bcf92 [Cheng Hao] Update the HiveContext Unittest
      9d281fa5
    • Liang-Chi Hsieh's avatar
      [Minor][SQL] Use same function to check path parameter in JSONRelation · ac506b7c
      Liang-Chi Hsieh authored
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #4649 from viirya/use_checkpath and squashes the following commits:
      
      0f9a1a1 [Liang-Chi Hsieh] Use same function to check path parameter.
      ac506b7c
    • Liang-Chi Hsieh's avatar
      [SPARK-5862][SQL] Only transformUp the given plan once in HiveMetastoreCatalog · 4611de1c
      Liang-Chi Hsieh authored
      Current `ParquetConversions` in `HiveMetastoreCatalog` will transformUp the given plan multiple times if there are many Metastore Parquet tables. Since the transformUp operation is recursive, it should be better to only perform it once.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #4651 from viirya/parquet_atonce and squashes the following commits:
      
      c1ed29d [Liang-Chi Hsieh] Fix bug.
      e0f919b [Liang-Chi Hsieh] Only transformUp the given plan once.
      4611de1c
    • CodingCat's avatar
      [Minor] fix typo in SQL document · 31efb39c
      CodingCat authored
      Author: CodingCat <zhunansjtu@gmail.com>
      
      Closes #4656 from CodingCat/fix_typo and squashes the following commits:
      
      b41d15c [CodingCat] recover
      689fe46 [CodingCat] fix typo
      31efb39c
    • Davies Liu's avatar
      [SPARK-5864] [PySpark] support .jar as python package · fc4eb950
      Davies Liu authored
      A jar file containing Python sources in it could be used as a Python package, just like zip file.
      
      spark-submit already put the jar file into PYTHONPATH, this patch also put it in the sys.path, then it could be used in Python worker.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #4652 from davies/jar and squashes the following commits:
      
      17d3f76 [Davies Liu] support .jar as python package
      fc4eb950
    • Sean Owen's avatar
      SPARK-5841 [CORE] [HOTFIX] Memory leak in DiskBlockManager · 49c19fdb
      Sean Owen authored
      Avoid call to remove shutdown hook being called from shutdown hook
      
      CC pwendell JoshRosen MattWhelan
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #4648 from srowen/SPARK-5841.2 and squashes the following commits:
      
      51548db [Sean Owen] Avoid call to remove shutdown hook being called from shutdown hook
      49c19fdb
    • Patrick Wendell's avatar
      MAINTENANCE: Automated closing of pull requests. · 24f358b9
      Patrick Wendell authored
      This commit exists to close the following pull requests on Github:
      
      Closes #3297 (close requested by 'andrewor14')
      Closes #3345 (close requested by 'pwendell')
      Closes #2729 (close requested by 'srowen')
      Closes #2320 (close requested by 'pwendell')
      Closes #4529 (close requested by 'andrewor14')
      Closes #2098 (close requested by 'srowen')
      Closes #4120 (close requested by 'andrewor14')
      24f358b9
    • MechCoder's avatar
      [SPARK-3381] [MLlib] Eliminate bins for unordered features in DecisionTrees · 9b746f38
      MechCoder authored
      For unordered features, it is sufficient to use splits since the threshold of the split corresponds the threshold of the HighSplit of the bin and there is no use of the LowSplit.
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #4231 from MechCoder/spark-3381 and squashes the following commits:
      
      58c19a5 [MechCoder] COSMIT
      c274b74 [MechCoder] Remove unordered feature calculation in labeledPointToTreePoint
      b2b9b89 [MechCoder] COSMIT
      d3ee042 [MechCoder] [SPARK-3381] [MLlib] Eliminate bins for unordered features
      9b746f38
    • xukun 00228947's avatar
      [SPARK-5661]function hasShutdownDeleteTachyonDir should use... · b271c265
      xukun 00228947 authored
      [SPARK-5661]function hasShutdownDeleteTachyonDir should use shutdownDeleteTachyonPaths to determine whether contains file
      
      hasShutdownDeleteTachyonDir(file: TachyonFile) should use shutdownDeleteTachyonPaths(not shutdownDeletePaths) to determine Whether contain file. To solve it ,delete two unused function.
      
      Author: xukun 00228947 <xukun.xu@huawei.com>
      Author: viper-kun <xukun.xu@huawei.com>
      
      Closes #4418 from viper-kun/deleteunusedfun and squashes the following commits:
      
      87340eb [viper-kun] fix style
      3d6c69e [xukun 00228947] fix bug
      2bc397e [xukun 00228947] deleteunusedfun
      b271c265
Loading