Skip to content
Snippets Groups Projects
  1. Aug 12, 2015
    • cody koeninger's avatar
      [SPARK-9780] [STREAMING] [KAFKA] prevent NPE if KafkaRDD instantiation … · 8ce60963
      cody koeninger authored
      …fails
      
      Author: cody koeninger <cody@koeninger.org>
      
      Closes #8133 from koeninger/SPARK-9780 and squashes the following commits:
      
      406259d [cody koeninger] [SPARK-9780][Streaming][Kafka] prevent NPE if KafkaRDD instantiation fails
      8ce60963
    • Michael Armbrust's avatar
      [SPARK-9449] [SQL] Include MetastoreRelation's inputFiles · 660e6dcf
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #8119 from marmbrus/metastoreInputFiles.
      660e6dcf
    • Xiangrui Meng's avatar
      [SPARK-9915] [ML] stopWords should use StringArrayParam · fc1c7fd6
      Xiangrui Meng authored
      hhbyyh
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #8141 from mengxr/SPARK-9915.
      fc1c7fd6
    • Xiangrui Meng's avatar
      [SPARK-9912] [MLLIB] QRDecomposition should use QType and RType for type names... · e6aef557
      Xiangrui Meng authored
      [SPARK-9912] [MLLIB] QRDecomposition should use QType and RType for type names instead of UType and VType
      
      hhbyyh
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #8140 from mengxr/SPARK-9912.
      e6aef557
    • Holden Karau's avatar
      [SPARK-9909] [ML] [TRIVIAL] move weightCol to shared params · 6e409bc1
      Holden Karau authored
      As per the TODO move weightCol to Shared Params.
      
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #8144 from holdenk/SPARK-9909-move-weightCol-toSharedParams.
      6e409bc1
    • Xiangrui Meng's avatar
      [SPARK-9913] [MLLIB] LDAUtils should be private · caa14d9d
      Xiangrui Meng authored
      feynmanliang
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #8142 from mengxr/SPARK-9913.
      caa14d9d
    • Yin Huai's avatar
      [SPARK-9894] [SQL] Json writer should handle MapData. · 7035d880
      Yin Huai authored
      https://issues.apache.org/jira/browse/SPARK-9894
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #8137 from yhuai/jsonMapData.
      7035d880
    • Michel Lemay's avatar
      [SPARK-9826] [CORE] Fix cannot use custom classes in log4j.properties · ab7e721c
      Michel Lemay authored
      Refactor Utils class and create ShutdownHookManager.
      
      NOTE: Wasn't able to run /dev/run-tests on windows machine.
      Manual tests were conducted locally using custom log4j.properties file with Redis appender and logstash formatter (bundled in the fat-jar submitted to spark)
      
      ex:
      log4j.rootCategory=WARN,console,redis
      log4j.appender.console=org.apache.log4j.ConsoleAppender
      log4j.appender.console.target=System.err
      log4j.appender.console.layout=org.apache.log4j.PatternLayout
      log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
      
      log4j.logger.org.eclipse.jetty=WARN
      log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR
      log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
      log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
      log4j.logger.org.apache.spark.graphx.Pregel=INFO
      
      log4j.appender.redis=com.ryantenney.log4j.FailoverRedisAppender
      log4j.appender.redis.endpoints=hostname:port
      log4j.appender.redis.key=mykey
      log4j.appender.redis.alwaysBatch=false
      log4j.appender.redis.layout=net.logstash.log4j.JSONEventLayoutV1
      
      Author: michellemay <mlemay@gmail.com>
      
      Closes #8109 from michellemay/SPARK-9826.
      ab7e721c
    • Niranjan Padmanabhan's avatar
      [SPARK-9092] Fixed incompatibility when both num-executors and dynamic... · 738f3539
      Niranjan Padmanabhan authored
      … allocation are set. Now, dynamic allocation is set to false when num-executors is explicitly specified as an argument. Consequently, executorAllocationManager in not initialized in the SparkContext.
      
      Author: Niranjan Padmanabhan <niranjan.padmanabhan@cloudera.com>
      
      Closes #7657 from neurons/SPARK-9092.
      738f3539
    • Reynold Xin's avatar
      [SPARK-9907] [SQL] Python crc32 is mistakenly calling md5 · a17384fa
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #8138 from rxin/SPARK-9907.
      a17384fa
    • Xiangrui Meng's avatar
      [SPARK-8967] [DOC] add Since annotation · 6f60298b
      Xiangrui Meng authored
      Add `Since` as a Scala annotation. The benefit is that we can use it without having explicit JavaDoc. This is useful for inherited methods. The limitation is that is doesn't show up in the generated Java API documentation. This might be fixed by modifying genjavadoc. I think we could leave it as a TODO.
      
      This is how the generated Scala doc looks:
      
      `since` JavaDoc tag:
      
      ![screen shot 2015-08-11 at 10 00 37 pm](https://cloud.githubusercontent.com/assets/829644/9230761/fa72865c-40d8-11e5-807e-0f3c815c5acd.png)
      
      `Since` annotation:
      
      ![screen shot 2015-08-11 at 10 00 28 pm](https://cloud.githubusercontent.com/assets/829644/9230764/0041d7f4-40d9-11e5-8124-c3f3e5d5b31f.png)
      
      rxin
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #8131 from mengxr/SPARK-8967.
      6f60298b
    • Joseph K. Bradley's avatar
      [SPARK-9789] [ML] Added logreg threshold param back · 551def5d
      Joseph K. Bradley authored
      Reinstated LogisticRegression.threshold Param for binary compatibility.  Param thresholds overrides threshold, if set.
      
      CC: mengxr dbtsai feynmanliang
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #8079 from jkbradley/logreg-reinstate-threshold.
      551def5d
    • Yanbo Liang's avatar
      [SPARK-9766] [ML] [PySpark] check and add miss docs for PySpark ML · 762bacc1
      Yanbo Liang authored
      Check and add miss docs for PySpark ML (this issue only check miss docs for o.a.s.ml not o.a.s.mllib).
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #8059 from yanboliang/SPARK-9766.
      762bacc1
    • Brennan Ashton's avatar
      [SPARK-9726] [PYTHON] PySpark DF join no longer accepts on=None · 60103ecd
      Brennan Ashton authored
      rxin
      
      First pull request for Spark so let me know if I am missing anything
      The contribution is my original work and I license the work to the project under the project's open source license.
      
      Author: Brennan Ashton <bashton@brennanashton.com>
      
      Closes #8016 from btashton/patch-1.
      60103ecd
    • Joseph K. Bradley's avatar
      [SPARK-9847] [ML] Modified copyValues to distinguish between default, explicit param values · 70fe5588
      Joseph K. Bradley authored
      From JIRA: Currently, Params.copyValues copies default parameter values to the paramMap of the target instance, rather than the defaultParamMap. It should copy to the defaultParamMap because explicitly setting a parameter can change the semantics.
      This issue arose in SPARK-9789, where 2 params "threshold" and "thresholds" for LogisticRegression can have mutually exclusive values. If thresholds is set, then fit() will copy the default value of threshold as well, easily resulting in inconsistent settings for the 2 params.
      
      CC: mengxr
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #8115 from jkbradley/copyvalues-fix.
      70fe5588
    • Marcelo Vanzin's avatar
      [SPARK-9804] [HIVE] Use correct value for isSrcLocal parameter. · 57ec27dd
      Marcelo Vanzin authored
      If the correct parameter is not provided, Hive will run into an error
      because it calls methods that are specific to the local filesystem to
      copy the data.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #8086 from vanzin/SPARK-9804.
      57ec27dd
    • Andrew Or's avatar
      [SPARK-9747] [SQL] Avoid starving an unsafe operator in aggregation · e0110792
      Andrew Or authored
      This is the sister patch to #8011, but for aggregation.
      
      In a nutshell: create the `TungstenAggregationIterator` before computing the parent partition. Internally this creates a `BytesToBytesMap` which acquires a page in the constructor as of this patch. This ensures that the aggregation operator is not starved since we reserve at least 1 page in advance.
      
      rxin yhuai
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #8038 from andrewor14/unsafe-starve-memory-agg.
      e0110792
    • Yuhao Yang's avatar
      [SPARK-7583] [MLLIB] User guide update for RegexTokenizer · 66d87c1d
      Yuhao Yang authored
      jira: https://issues.apache.org/jira/browse/SPARK-7583
      
      User guide update for RegexTokenizer
      
      Author: Yuhao Yang <hhbyyh@gmail.com>
      
      Closes #7828 from hhbyyh/regexTokenizerDoc.
      66d87c1d
    • Andrew Or's avatar
      [SPARK-9795] Dynamic allocation: avoid double counting when killing same executor twice · be5d1912
      Andrew Or authored
      This is based on KaiXinXiaoLei's changes in #7716.
      
      The issue is that when someone calls `sc.killExecutor("1")` on the same executor twice quickly, then the executor target will be adjusted downwards by 2 instead of 1 even though we're only actually killing one executor. In certain cases where we don't adjust the target back upwards quickly, we'll end up with jobs hanging.
      
      This is a common danger because there are many places where this is called:
      - `HeartbeatReceiver` kills an executor that has not been sending heartbeats
      - `ExecutorAllocationManager` kills an executor that has been idle
      - The user code might call this, which may interfere with the previous callers
      
      While it's not clear whether this fixes SPARK-9745, fixing this potential race condition seems like a strict improvement. I've added a regression test to illustrate the issue.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #8078 from andrewor14/da-double-kill.
      be5d1912
    • Tom White's avatar
      [SPARK-8625] [CORE] Propagate user exceptions in tasks back to driver · 2e680668
      Tom White authored
      This allows clients to retrieve the original exception from the
      cause field of the SparkException that is thrown by the driver.
      If the original exception is not in fact Serializable then it will
      not be returned, but the message and stacktrace will be. (All Java
      Throwables implement the Serializable interface, but this is no
      guarantee that a particular implementation can actually be
      serialized.)
      
      Author: Tom White <tom@cloudera.com>
      
      Closes #7014 from tomwhite/propagate-user-exceptions.
      2e680668
    • Cheng Lian's avatar
      [SPARK-9407] [SQL] Relaxes Parquet ValidTypeMap to allow ENUM predicates to be pushed down · 3ecb3794
      Cheng Lian authored
      This PR adds a hacky workaround for PARQUET-201, and should be removed once we upgrade to parquet-mr 1.8.1 or higher versions.
      
      In Parquet, not all types of columns can be used for filter push-down optimization.  The set of valid column types is controlled by `ValidTypeMap`.  Unfortunately, in parquet-mr 1.7.0 and prior versions, this limitation is too strict, and doesn't allow `BINARY (ENUM)` columns to be pushed down.  On the other hand, `BINARY (ENUM)` is commonly seen in Parquet files written by libraries like `parquet-avro`.
      
      This restriction is problematic for Spark SQL, because Spark SQL doesn't have a type that maps to Parquet `BINARY (ENUM)` directly, and always converts `BINARY (ENUM)` to Catalyst `StringType`.  Thus, a predicate involving a `BINARY (ENUM)` is recognized as one involving a string field instead and can be pushed down by the query optimizer.  Such predicates are actually perfectly legal except that it fails the `ValidTypeMap` check.
      
      The workaround added here is relaxing `ValidTypeMap` to include `BINARY (ENUM)`.  I also took the chance to simplify `ParquetCompatibilityTest` a little bit when adding regression test.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #8107 from liancheng/spark-9407/parquet-enum-filter-push-down.
      3ecb3794
    • Yijie Shen's avatar
      [SPARK-9182] [SQL] Filters are not passed through to jdbc source · 9d082245
      Yijie Shen authored
      This PR fixes unable to push filter down to JDBC source caused by `Cast` during pattern matching.
      
      While we are comparing columns of different type, there's a big chance we need a cast on the column, therefore not match the pattern directly on Attribute and would fail to push down.
      
      Author: Yijie Shen <henry.yijieshen@gmail.com>
      
      Closes #8049 from yjshen/jdbc_pushdown.
      9d082245
    • Timothy Chen's avatar
      [SPARK-9575] [MESOS] Add docuemntation around Mesos shuffle service. · 741a29f9
      Timothy Chen authored
      andrewor14
      
      Author: Timothy Chen <tnachen@gmail.com>
      
      Closes #7907 from tnachen/mesos_shuffle.
      741a29f9
    • Timothy Chen's avatar
      [SPARK-8798] [MESOS] Allow additional uris to be fetched with mesos · 5c99d8bf
      Timothy Chen authored
      Some users like to download additional files in their sandbox that they can refer to from their spark program, or even later mount these files to another directory.
      
      Author: Timothy Chen <tnachen@gmail.com>
      
      Closes #7195 from tnachen/mesos_files.
      5c99d8bf
    • Carson Wang's avatar
      [SPARK-9426] [WEBUI] Job page DAG visualization is not shown · bab89232
      Carson Wang authored
      To reproduce the issue, go to the stage page and click DAG Visualization once, then go to the job page to show the job DAG visualization. You will only see the first stage of the job.
      Root cause: the java script use local storage to remember your selection. Once you click the stage DAG visualization, the local storage set `expand-dag-viz-arrow-stage` to true. When you go to the job page, the js checks `expand-dag-viz-arrow-stage` in the local storage first and will try to show stage DAG visualization on the job page.
      To fix this, I set an id to the DAG span to differ job page and stage page. In the js code, we check the id and local storage together to make sure we show the correct DAG visualization.
      
      Author: Carson Wang <carson.wang@intel.com>
      
      Closes #8104 from carsonwang/SPARK-9426.
      bab89232
    • zsxwing's avatar
      [SPARK-9829] [WEBUI] Display the update value for peak execution memory · 4e3f4b93
      zsxwing authored
      The peak execution memory is not correct because it shows the sum of finished tasks' values when a task finishes.
      
      This PR fixes it by using the update value rather than the accumulator value.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #8121 from zsxwing/SPARK-9829.
      4e3f4b93
    • Rohit Agarwal's avatar
      [SPARK-9806] [WEB UI] Don't share ReplayListenerBus between multiple applications · a807fcbe
      Rohit Agarwal authored
      Author: Rohit Agarwal <rohita@qubole.com>
      
      Closes #8088 from mindprince/SPARK-9806.
      a807fcbe
    • xutingjun's avatar
      [SPARK-8366] maxNumExecutorsNeeded should properly handle failed tasks · b85f9a24
      xutingjun authored
      Author: xutingjun <xutingjun@huawei.com>
      Author: meiyoula <1039320815@qq.com>
      
      Closes #6817 from XuTingjun/SPARK-8366.
      b85f9a24
    • Josh Rosen's avatar
      [SPARK-9854] [SQL] RuleExecutor.timeMap should be thread-safe · b1581ac2
      Josh Rosen authored
      `RuleExecutor.timeMap` is currently a non-thread-safe mutable HashMap; this can lead to infinite loops if multiple threads are concurrently modifying the map.  I believe that this is responsible for some hangs that I've observed in HiveQuerySuite.
      
      This patch addresses this by using a Guava `AtomicLongMap`.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #8120 from JoshRosen/rule-executor-time-map-fix.
      b1581ac2
    • Davies Liu's avatar
      [SPARK-9831] [SQL] fix serialization with empty broadcast · c3e9a120
      Davies Liu authored
      Author: Davies Liu <davies@databricks.com>
      
      Closes #8117 from davies/fix_serialization and squashes the following commits:
      
      d21ac71 [Davies Liu] fix serialization with empty broadcast
      c3e9a120
  2. Aug 11, 2015
    • Eric Liang's avatar
      [SPARK-9713] [ML] Document SparkR MLlib glm() integration in Spark 1.5 · 74a293f4
      Eric Liang authored
      This documents the use of R model formulae in the SparkR guide. Also fixes some bugs in the R api doc.
      
      mengxr
      
      Author: Eric Liang <ekl@databricks.com>
      
      Closes #8085 from ericl/docs.
      74a293f4
    • Patrick Wendell's avatar
      [SPARK-1517] Refactor release scripts to facilitate nightly publishing · 3ef0f329
      Patrick Wendell authored
      This update contains some code changes to the release scripts that allow easier nightly publishing. I've been using these new scripts on Jenkins for cutting and publishing nightly snapshots for the last month or so, and it has been going well. I'd like to get them merged back upstream so this can be maintained by the community.
      
      The main changes are:
      1. Separates the release tagging from various build possibilities for an already tagged release (`release-tag.sh` and `release-build.sh`).
      2. Allow for injecting credentials through the environment, including GPG keys. This is then paired with secure key injection in Jenkins.
      3. Support for copying build results to a remote directory, and also "rotating" results, e.g. the ability to keep the last N copies of binary or doc builds.
      
      I'm happy if anyone wants to take a look at this - it's not user facing but an internal utility used for generating releases.
      
      Author: Patrick Wendell <patrick@databricks.com>
      
      Closes #7411 from pwendell/release-script-updates and squashes the following commits:
      
      74f9beb [Patrick Wendell] Moving maven build command to a variable
      233ce85 [Patrick Wendell] [SPARK-1517] Refactor release scripts to facilitate nightly publishing
      3ef0f329
    • Andrew Or's avatar
      [SPARK-9649] Fix flaky test MasterSuite again - disable REST · ca8f70e9
      Andrew Or authored
      The REST server is not actually used in most tests and so we can disable it. It is a source of flakiness because it tries to bind to a specific port in vain. There was also some code that avoided the shuffle service in tests. This is actually not necessary because the shuffle service is already off by default.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #8084 from andrewor14/fix-master-suite-again.
      ca8f70e9
    • Reynold Xin's avatar
      [SPARK-9849] [SQL] DirectParquetOutputCommitter qualified name should be backward compatible · afa757c9
      Reynold Xin authored
      DirectParquetOutputCommitter was moved in SPARK-9763. However, users can explicitly set the class as a config option, so we must be able to resolve the old committer qualified name.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #8114 from rxin/SPARK-9849.
      afa757c9
    • Marcelo Vanzin's avatar
      [SPARK-9074] [LAUNCHER] Allow arbitrary Spark args to be set. · 5a5bbc29
      Marcelo Vanzin authored
      This change allows any Spark argument to be added to the app to
      be started using SparkLauncher. Known arguments are properly
      validated, while unknown arguments are allowed so that the
      library can launch newer Spark versions (in case SPARK_HOME points
      at one).
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #7975 from vanzin/SPARK-9074 and squashes the following commits:
      
      b5e451a [Marcelo Vanzin] [SPARK-9074] [launcher] Allow arbitrary Spark args to be set.
      5a5bbc29
    • Andrew Or's avatar
      [HOTFIX] Fix style error caused by 017b5de0 · 736af95b
      Andrew Or authored
      736af95b
    • Sudhakar Thota's avatar
      [SPARK-8925] [MLLIB] Add @since tags to mllib.util · 017b5de0
      Sudhakar Thota authored
      Went thru the history of changes the file MLUtils.scala and picked up the version that the change went in.
      
      Author: Sudhakar Thota <sudhakarthota@yahoo.com>
      Author: Sudhakar Thota <sudhakarthota@sudhakars-mbp-2.usca.ibm.com>
      
      Closes #7436 from sthota2014/SPARK-8925_thotas.
      017b5de0
    • Feynman Liang's avatar
      [SPARK-9788] [MLLIB] Fix LDA Binary Compatibility · be3e2716
      Feynman Liang authored
      1. Add “asymmetricDocConcentration” and revert docConcentration changes. If the (internal) doc concentration vector is a single value, “getDocConcentration" returns it. If it is a constant vector, getDocConcentration returns the first item, and fails otherwise.
      2. Give `LDAModel.gammaShape` a default value in `LDAModel` concrete class constructors.
      
      jkbradley
      
      Author: Feynman Liang <fliang@databricks.com>
      
      Closes #8077 from feynmanliang/SPARK-9788 and squashes the following commits:
      
      6b07bc8 [Feynman Liang] Code review changes
      9d6a71e [Feynman Liang] Add asymmetricAlpha alias
      bf4e685 [Feynman Liang] Asymmetric docConcentration
      4cab972 [Feynman Liang] Default gammaShape
      be3e2716
    • Xiangrui Meng's avatar
      Closes #1290 · 423cdfd8
      Xiangrui Meng authored
      Closes #4934
      423cdfd8
    • zsxwing's avatar
      [SPARK-9824] [CORE] Fix the issue that InternalAccumulator leaks WeakReference · f16bc68d
      zsxwing authored
      `InternalAccumulator.create` doesn't call `registerAccumulatorForCleanup` to register itself with ContextCleaner, so `WeakReference`s for these accumulators in `Accumulators.originals` won't be removed.
      
      This PR added `registerAccumulatorForCleanup` for internal accumulators to avoid the memory leak.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #8108 from zsxwing/internal-accumulators-leak.
      f16bc68d
Loading