Skip to content
Snippets Groups Projects
  1. Nov 16, 2016
    • Holden Karau's avatar
      [SPARK-1267][SPARK-18129] Allow PySpark to be pip installed · 6a3cbbc0
      Holden Karau authored
      ## What changes were proposed in this pull request?
      
      This PR aims to provide a pip installable PySpark package. This does a bunch of work to copy the jars over and package them with the Python code (to prevent challenges from trying to use different versions of the Python code with different versions of the JAR). It does not currently publish to PyPI but that is the natural follow up (SPARK-18129).
      
      Done:
      - pip installable on conda [manual tested]
      - setup.py installed on a non-pip managed system (RHEL) with YARN [manual tested]
      - Automated testing of this (virtualenv)
      - packaging and signing with release-build*
      
      Possible follow up work:
      - release-build update to publish to PyPI (SPARK-18128)
      - figure out who owns the pyspark package name on prod PyPI (is it someone with in the project or should we ask PyPI or should we choose a different name to publish with like ApachePySpark?)
      - Windows support and or testing ( SPARK-18136 )
      - investigate details of wheel caching and see if we can avoid cleaning the wheel cache during our test
      - consider how we want to number our dev/snapshot versions
      
      Explicitly out of scope:
      - Using pip installed PySpark to start a standalone cluster
      - Using pip installed PySpark for non-Python Spark programs
      
      *I've done some work to test release-build locally but as a non-committer I've just done local testing.
      ## How was this patch tested?
      
      Automated testing with virtualenv, manual testing with conda, a system wide install, and YARN integration.
      
      release-build changes tested locally as a non-committer (no testing of upload artifacts to Apache staging websites)
      
      Author: Holden Karau <holden@us.ibm.com>
      Author: Juliet Hougland <juliet@cloudera.com>
      Author: Juliet Hougland <not@myemail.com>
      
      Closes #15659 from holdenk/SPARK-1267-pip-install-pyspark.
      6a3cbbc0
    • Takuya UESHIN's avatar
      [SPARK-18442][SQL] Fix nullability of WrapOption. · 95157938
      Takuya UESHIN authored
      
      ## What changes were proposed in this pull request?
      
      The nullability of `WrapOption` should be `false`.
      
      ## How was this patch tested?
      
      Existing tests.
      
      Author: Takuya UESHIN <ueshin@happy-camper.st>
      
      Closes #15887 from ueshin/issues/SPARK-18442.
      
      (cherry picked from commit 170eeb34)
      Signed-off-by: default avatarWenchen Fan <wenchen@databricks.com>
      95157938
    • Artur Sukhenko's avatar
      [YARN][DOC] Increasing NodeManager's heap size with External Shuffle Service · 523abfe1
      Artur Sukhenko authored
      ## What changes were proposed in this pull request?
      
      Suggest users to increase `NodeManager's` heap size if `External Shuffle Service` is enabled as
      `NM` can spend a lot of time doing GC resulting in  shuffle operations being a bottleneck due to `Shuffle Read blocked time` bumped up.
      Also because of GC  `NodeManager` can use an enormous amount of CPU and cluster performance will suffer.
      I have seen NodeManager using 5-13G RAM and up to 2700% CPU with `spark_shuffle` service on.
      
      ## How was this patch tested?
      
      #### Added step 5:
      ![shuffle_service](https://cloud.githubusercontent.com/assets/15244468/20355499/2fec0fde-ac2a-11e6-8f8b-1c80daf71be1.png
      
      )
      
      Author: Artur Sukhenko <artur.sukhenko@gmail.com>
      
      Closes #15906 from Devian-ua/nmHeapSize.
      
      (cherry picked from commit 55589987)
      Signed-off-by: default avatarReynold Xin <rxin@databricks.com>
      523abfe1
    • Tathagata Das's avatar
      [SPARK-18461][DOCS][STRUCTUREDSTREAMING] Added more information about monitoring streaming queries · 3d4756d5
      Tathagata Das authored
      ## What changes were proposed in this pull request?
      <img width="941" alt="screen shot 2016-11-15 at 6 27 32 pm" src="https://cloud.githubusercontent.com/assets/663212/20332521/4190b858-ab61-11e6-93a6-4bdc05105ed9.png">
      <img width="940" alt="screen shot 2016-11-15 at 6 27 45 pm" src="https://cloud.githubusercontent.com/assets/663212/20332525/44a0d01e-ab61-11e6-8668-47f925490d4f.png
      
      ">
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #15897 from tdas/SPARK-18461.
      
      (cherry picked from commit bb6cdfd9)
      Signed-off-by: default avatarMichael Armbrust <michael@databricks.com>
      3d4756d5
    • Tathagata Das's avatar
      [SPARK-18459][SPARK-18460][STRUCTUREDSTREAMING] Rename triggerId to batchId... · b86e962c
      Tathagata Das authored
      [SPARK-18459][SPARK-18460][STRUCTUREDSTREAMING] Rename triggerId to batchId and add triggerDetails to json in StreamingQueryStatus
      
      ## What changes were proposed in this pull request?
      
      SPARK-18459: triggerId seems like a number that should be increasing with each trigger, whether or not there is data in it. However, actually, triggerId increases only where there is a batch of data in a trigger. So its better to rename it to batchId.
      
      SPARK-18460: triggerDetails was missing from json representation. Fixed it.
      
      ## How was this patch tested?
      Updated existing unit tests.
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #15895 from tdas/SPARK-18459.
      
      (cherry picked from commit 0048ce7c)
      Signed-off-by: default avatarShixiong Zhu <shixiong@databricks.com>
      b86e962c
    • gatorsmile's avatar
      [SPARK-18415][SQL] Weird Plan Output when CTE used in RunnableCommand · c0dbe08d
      gatorsmile authored
      
      ### What changes were proposed in this pull request?
      Currently, when CTE is used in RunnableCommand, the Analyzer does not replace the logical node `With`. The child plan of RunnableCommand is not resolved. Thus, the output of the `With` plan node looks very confusing.
      For example,
      ```
      sql(
        """
          |CREATE VIEW cte_view AS
          |WITH w AS (SELECT 1 AS n), cte1 (select 2), cte2 as (select 3)
          |SELECT n FROM w
        """.stripMargin).explain()
      ```
      The output is like
      ```
      ExecutedCommand
         +- CreateViewCommand `cte_view`, WITH w AS (SELECT 1 AS n), cte1 (select 2), cte2 as (select 3)
      SELECT n FROM w, false, false, PersistedView
               +- 'With [(w,SubqueryAlias w
      +- Project [1 AS n#16]
         +- OneRowRelation$
      ), (cte1,'SubqueryAlias cte1
      +- 'Project [unresolvedalias(2, None)]
         +- OneRowRelation$
      ), (cte2,'SubqueryAlias cte2
      +- 'Project [unresolvedalias(3, None)]
         +- OneRowRelation$
      )]
                  +- 'Project ['n]
                     +- 'UnresolvedRelation `w`
      ```
      After the fix, the output is as shown below.
      ```
      ExecutedCommand
         +- CreateViewCommand `cte_view`, WITH w AS (SELECT 1 AS n), cte1 (select 2), cte2 as (select 3)
      SELECT n FROM w, false, false, PersistedView
               +- CTE [w, cte1, cte2]
                  :  :- SubqueryAlias w
                  :  :  +- Project [1 AS n#16]
                  :  :     +- OneRowRelation$
                  :  :- 'SubqueryAlias cte1
                  :  :  +- 'Project [unresolvedalias(2, None)]
                  :  :     +- OneRowRelation$
                  :  +- 'SubqueryAlias cte2
                  :     +- 'Project [unresolvedalias(3, None)]
                  :        +- OneRowRelation$
                  +- 'Project ['n]
                     +- 'UnresolvedRelation `w`
      ```
      
      BTW, this PR also fixes the output of the view type.
      
      ### How was this patch tested?
      Manual
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #15854 from gatorsmile/cteName.
      
      (cherry picked from commit 608ecc51)
      Signed-off-by: default avatarHerman van Hovell <hvanhovell@databricks.com>
      c0dbe08d
    • Xianyang Liu's avatar
      [SPARK-18420][BUILD] Fix the errors caused by lint check in Java · b0ae8712
      Xianyang Liu authored
      
      Small fix, fix the errors caused by lint check in Java
      
      - Clear unused objects and `UnusedImports`.
      - Add comments around the method `finalize` of `NioBufferedFileInputStream`to turn off checkstyle.
      - Cut the line which is longer than 100 characters into two lines.
      
      Travis CI.
      ```
      $ build/mvn -T 4 -q -DskipTests -Pyarn -Phadoop-2.3 -Pkinesis-asl -Phive -Phive-thriftserver install
      $ dev/lint-java
      ```
      Before:
      ```
      Checkstyle checks failed at following occurrences:
      [ERROR] src/main/java/org/apache/spark/network/util/TransportConf.java:[21,8] (imports) UnusedImports: Unused import - org.apache.commons.crypto.cipher.CryptoCipherFactory.
      [ERROR] src/test/java/org/apache/spark/network/sasl/SparkSaslSuite.java:[516,5] (modifier) RedundantModifier: Redundant 'public' modifier.
      [ERROR] src/main/java/org/apache/spark/io/NioBufferedFileInputStream.java:[133] (coding) NoFinalizer: Avoid using finalizer method.
      [ERROR] src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeMapData.java:[71] (sizes) LineLength: Line is longer than 100 characters (found 113).
      [ERROR] src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java:[112] (sizes) LineLength: Line is longer than 100 characters (found 110).
      [ERROR] src/test/java/org/apache/spark/sql/catalyst/expressions/HiveHasherSuite.java:[31,17] (modifier) ModifierOrder: 'static' modifier out of order with the JLS suggestions.
      [ERROR]src/main/java/org/apache/spark/examples/ml/JavaLogisticRegressionWithElasticNetExample.java:[64] (sizes) LineLength: Line is longer than 100 characters (found 103).
      [ERROR] src/main/java/org/apache/spark/examples/ml/JavaInteractionExample.java:[22,8] (imports) UnusedImports: Unused import - org.apache.spark.ml.linalg.Vectors.
      [ERROR] src/main/java/org/apache/spark/examples/ml/JavaInteractionExample.java:[51] (regexp) RegexpSingleline: No trailing whitespace allowed.
      ```
      
      After:
      ```
      $ build/mvn -T 4 -q -DskipTests -Pyarn -Phadoop-2.3 -Pkinesis-asl -Phive -Phive-thriftserver install
      $ dev/lint-java
      Using `mvn` from path: /home/travis/build/ConeyLiu/spark/build/apache-maven-3.3.9/bin/mvn
      Checkstyle checks passed.
      ```
      
      Author: Xianyang Liu <xyliu0530@icloud.com>
      
      Closes #15865 from ConeyLiu/master.
      
      (cherry picked from commit 7569cf6c)
      Signed-off-by: default avatarSean Owen <sowen@cloudera.com>
      Unverified
      b0ae8712
    • Zheng RuiFeng's avatar
      [SPARK-18446][ML][DOCS] Add links to API docs for ML algos · 416bc3dd
      Zheng RuiFeng authored
      
      ## What changes were proposed in this pull request?
      Add links to API docs for ML algos
      ## How was this patch tested?
      Manual checking for the API links
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #15890 from zhengruifeng/algo_link.
      
      (cherry picked from commit a75e3fe9)
      Signed-off-by: default avatarSean Owen <sowen@cloudera.com>
      Unverified
      416bc3dd
    • Zheng RuiFeng's avatar
      [SPARK-18434][ML] Add missing ParamValidations for ML algos · 6b6eb4e5
      Zheng RuiFeng authored
      
      ## What changes were proposed in this pull request?
      Add missing ParamValidations for ML algos
      ## How was this patch tested?
      existing tests
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #15881 from zhengruifeng/arg_checking.
      
      (cherry picked from commit c68f1a38)
      Signed-off-by: default avatarYanbo Liang <ybliang8@gmail.com>
      6b6eb4e5
    • Weiqing Yang's avatar
      [MINOR][DOC] Fix typos in the 'configuration', 'monitoring' and... · 82084700
      Weiqing Yang authored
      [MINOR][DOC] Fix typos in the 'configuration', 'monitoring' and 'sql-programming-guide' documentation
      
      ## What changes were proposed in this pull request?
      
      Fix typos in the 'configuration', 'monitoring' and 'sql-programming-guide' documentation.
      
      ## How was this patch tested?
      Manually.
      
      Author: Weiqing Yang <yangweiqing001@gmail.com>
      
      Closes #15886 from weiqingy/fixTypo.
      
      (cherry picked from commit 241e04bc)
      Signed-off-by: default avatarSean Owen <sowen@cloudera.com>
      Unverified
      82084700
    • uncleGen's avatar
      [SPARK-18410][STREAMING] Add structured kafka example · 6b2301b8
      uncleGen authored
      
      ## What changes were proposed in this pull request?
      
      This PR provides structured kafka wordcount examples
      
      ## How was this patch tested?
      
      Author: uncleGen <hustyugm@gmail.com>
      
      Closes #15849 from uncleGen/SPARK-18410.
      
      (cherry picked from commit e6145772)
      Signed-off-by: default avatarSean Owen <sowen@cloudera.com>
      Unverified
      6b2301b8
    • Sean Owen's avatar
      [SPARK-18400][STREAMING] NPE when resharding Kinesis Stream · a94659ce
      Sean Owen authored
      
      ## What changes were proposed in this pull request?
      
      Avoid NPE in KinesisRecordProcessor when shutdown happens without successful init
      
      ## How was this patch tested?
      
      Existing tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #15882 from srowen/SPARK-18400.
      
      (cherry picked from commit 43a26899)
      Signed-off-by: default avatarSean Owen <sowen@cloudera.com>
      Unverified
      a94659ce
    • Liwei Lin's avatar
      [DOC][MINOR] Kafka doc: breakup into lines · 4567db9d
      Liwei Lin authored
      ## Before
      
      ![before](https://cloud.githubusercontent.com/assets/15843379/20340231/99b039fe-ac1b-11e6-9ba9-b44582427459.png)
      
      ## After
      
      ![after](https://cloud.githubusercontent.com/assets/15843379/20340236/9d5796e2-ac1b-11e6-92bb-6da40ba1a383.png
      
      )
      
      Author: Liwei Lin <lwlin7@gmail.com>
      
      Closes #15903 from lw-lin/kafka-doc-lines.
      
      (cherry picked from commit 3e01f128)
      Signed-off-by: default avatarSean Owen <sowen@cloudera.com>
      Unverified
      4567db9d
    • Dongjoon Hyun's avatar
      [SPARK-18433][SQL] Improve DataSource option keys to be more case-insensitive · b18c5a9b
      Dongjoon Hyun authored
      
      ## What changes were proposed in this pull request?
      
      This PR aims to improve DataSource option keys to be more case-insensitive
      
      DataSource partially use CaseInsensitiveMap in code-path. For example, the following fails to find url.
      
      ```scala
      val df = spark.createDataFrame(sparkContext.parallelize(arr2x2), schema2)
      df.write.format("jdbc")
          .option("UrL", url1)
          .option("dbtable", "TEST.SAVETEST")
          .options(properties.asScala)
          .save()
      ```
      
      This PR makes DataSource options to use CaseInsensitiveMap internally and also makes DataSource to use CaseInsensitiveMap generally except `InMemoryFileIndex` and `InsertIntoHadoopFsRelationCommand`. We can not pass them CaseInsensitiveMap because they creates new case-sensitive HadoopConfs by calling newHadoopConfWithOptions(options) inside.
      
      ## How was this patch tested?
      
      Pass the Jenkins test with newly added test cases.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #15884 from dongjoon-hyun/SPARK-18433.
      
      (cherry picked from commit 74f5c217)
      Signed-off-by: default avatarWenchen Fan <wenchen@databricks.com>
      b18c5a9b
    • Yanbo Liang's avatar
      [SPARK-18438][SPARKR][ML] spark.mlp should support RFormula. · 7b57e480
      Yanbo Liang authored
      
      ## What changes were proposed in this pull request?
      ```spark.mlp``` should support ```RFormula``` like other ML algorithm wrappers.
      BTW, I did some cleanup and improvement for ```spark.mlp```.
      
      ## How was this patch tested?
      Unit tests.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #15883 from yanboliang/spark-18438.
      
      (cherry picked from commit 95eb06bd)
      Signed-off-by: default avatarYanbo Liang <ybliang8@gmail.com>
      7b57e480
  2. Nov 15, 2016
  3. Nov 14, 2016
    • gatorsmile's avatar
      [SPARK-18430][SQL] Fixed Exception Messages when Hitting an Invocation Exception of Function Lookup · a0125fd6
      gatorsmile authored
      ### What changes were proposed in this pull request?
      When the exception is an invocation exception during function lookup, we return a useless/confusing error message:
      
      For example,
      ```Scala
      df.selectExpr("concat_ws()")
      ```
      Below is the error message we got:
      ```
      null; line 1 pos 0
      org.apache.spark.sql.AnalysisException: null; line 1 pos 0
      ```
      
      To get the meaningful error message, we need to get the cause. The fix is exactly the same as what we did in https://github.com/apache/spark/pull/12136
      
      . After the fix, the message we got is the exception issued in the constuctor of function implementation:
      ```
      requirement failed: concat_ws requires at least one argument.; line 1 pos 0
      org.apache.spark.sql.AnalysisException: requirement failed: concat_ws requires at least one argument.; line 1 pos 0
      ```
      
      ### How was this patch tested?
      Added test cases.
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #15878 from gatorsmile/functionNotFound.
      
      (cherry picked from commit 86430cc4)
      Signed-off-by: default avatarReynold Xin <rxin@databricks.com>
      a0125fd6
    • Zheng RuiFeng's avatar
      [SPARK-18428][DOC] Update docs for GraphX · 649c15fa
      Zheng RuiFeng authored
      
      ## What changes were proposed in this pull request?
      1, Add link of `VertexRDD` and `EdgeRDD`
      2, Notify in `Vertex and Edge RDDs` that not all methods are listed
      3, `VertexID` -> `VertexId`
      
      ## How was this patch tested?
      No tests, only docs is modified
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #15875 from zhengruifeng/update_graphop_doc.
      
      (cherry picked from commit c31def1d)
      Signed-off-by: default avatarReynold Xin <rxin@databricks.com>
      649c15fa
    • Michael Armbrust's avatar
      [SPARK-18124] Observed delay based Event Time Watermarks · 27999b36
      Michael Armbrust authored
      
      This PR adds a new method `withWatermark` to the `Dataset` API, which can be used specify an _event time watermark_.  An event time watermark allows the streaming engine to reason about the point in time after which we no longer expect to see late data.  This PR also has augmented `StreamExecution` to use this watermark for several purposes:
        - To know when a given time window aggregation is finalized and thus results can be emitted when using output modes that do not allow updates (e.g. `Append` mode).
        - To minimize the amount of state that we need to keep for on-going aggregations, by evicting state for groups that are no longer expected to change.  Although, we do still maintain all state if the query requires (i.e. if the event time is not present in the `groupBy` or when running in `Complete` mode).
      
      An example that emits windowed counts of records, waiting up to 5 minutes for late data to arrive.
      ```scala
      df.withWatermark("eventTime", "5 minutes")
        .groupBy(window($"eventTime", "1 minute") as 'window)
        .count()
        .writeStream
        .format("console")
        .mode("append") // In append mode, we only output finalized aggregations.
        .start()
      ```
      
      ### Calculating the watermark.
      The current event time is computed by looking at the `MAX(eventTime)` seen this epoch across all of the partitions in the query minus some user defined _delayThreshold_.  An additional constraint is that the watermark must increase monotonically.
      
      Note that since we must coordinate this value across partitions occasionally, the actual watermark used is only guaranteed to be at least `delay` behind the actual event time.  In some cases we may still process records that arrive more than delay late.
      
      This mechanism was chosen for the initial implementation over processing time for two reasons:
        - it is robust to downtime that could affect processing delay
        - it does not require syncing of time or timezones between the producer and the processing engine.
      
      ### Other notable implementation details
       - A new trigger metric `eventTimeWatermark` outputs the current value of the watermark.
       - We mark the event time column in the `Attribute` metadata using the key `spark.watermarkDelay`.  This allows downstream operations to know which column holds the event time.  Operations like `window` propagate this metadata.
       - `explain()` marks the watermark with a suffix of `-T${delayMs}` to ease debugging of how this information is propagated.
       - Currently, we don't filter out late records, but instead rely on the state store to avoid emitting records that are both added and filtered in the same epoch.
      
      ### Remaining in this PR
       - [ ] The test for recovery is currently failing as we don't record the watermark used in the offset log.  We will need to do so to ensure determinism, but this is deferred until #15626 is merged.
      
      ### Other follow-ups
      There are some natural additional features that we should consider for future work:
       - Ability to write records that arrive too late to some external store in case any out-of-band remediation is required.
       - `Update` mode so you can get partial results before a group is evicted.
       - Other mechanisms for calculating the watermark.  In particular a watermark based on quantiles would be more robust to outliers.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #15702 from marmbrus/watermarks.
      
      (cherry picked from commit c0718782)
      Signed-off-by: default avatarTathagata Das <tathagata.das1565@gmail.com>
      27999b36
    • Nattavut Sutyanyong's avatar
      [SPARK-17348][SQL] Incorrect results from subquery transformation · ae66799f
      Nattavut Sutyanyong authored
      
      ## What changes were proposed in this pull request?
      
      Return an Analysis exception when there is a correlated non-equality predicate in a subquery and the correlated column from the outer reference is not from the immediate parent operator of the subquery. This PR prevents incorrect results from subquery transformation in such case.
      
      Test cases, both positive and negative tests, are added.
      
      ## How was this patch tested?
      
      sql/test, catalyst/test, hive/test, and scenarios that will produce incorrect results without this PR and product correct results when subquery transformation does happen.
      
      Author: Nattavut Sutyanyong <nsy.can@gmail.com>
      
      Closes #15763 from nsyca/spark-17348.
      
      (cherry picked from commit bd85603b)
      Signed-off-by: default avatarHerman van Hovell <hvanhovell@databricks.com>
      ae66799f
    • Zheng RuiFeng's avatar
      [SPARK-11496][GRAPHX][FOLLOWUP] Add param checking for runParallelPersonalizedPageRank · cff7a70b
      Zheng RuiFeng authored
      
      ## What changes were proposed in this pull request?
      add the param checking to keep in line with other algos
      
      ## How was this patch tested?
      existing tests
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #15876 from zhengruifeng/param_check_runParallelPersonalizedPageRank.
      
      (cherry picked from commit 75934457)
      Signed-off-by: default avatarDB Tsai <dbtsai@dbtsai.com>
      Unverified
      cff7a70b
    • cody koeninger's avatar
      [SPARK-17510][STREAMING][KAFKA] config max rate on a per-partition basis · db691f05
      cody koeninger authored
      
      ## What changes were proposed in this pull request?
      
      Allow configuration of max rate on a per-topicpartition basis.
      ## How was this patch tested?
      
      Unit tests.
      
      The reporter (Jeff Nadler) said he could test on his workload, so let's wait on that report.
      
      Author: cody koeninger <cody@koeninger.org>
      
      Closes #15132 from koeninger/SPARK-17510.
      
      (cherry picked from commit 89d1fa58)
      Signed-off-by: default avatarShixiong Zhu <shixiong@databricks.com>
      db691f05
    • Tathagata Das's avatar
      [SPARK-18416][STRUCTURED STREAMING] Fixed temp file leak in state store · 3c623d22
      Tathagata Das authored
      
      ## What changes were proposed in this pull request?
      
      StateStore.get() causes temporary files to be created immediately, even if the store is not used to make updates for new version. The temp file is not closed as store.commit() is not called in those cases, thus keeping the output stream to temp file open forever.
      
      This PR fixes it by opening the temp file only when there are updates being made.
      
      ## How was this patch tested?
      
      New unit test
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #15859 from tdas/SPARK-18416.
      
      (cherry picked from commit bdfe60ac)
      Signed-off-by: default avatarShixiong Zhu <shixiong@databricks.com>
      3c623d22
    • Noritaka Sekiyama's avatar
      [SPARK-18432][DOC] Changed HDFS default block size from 64MB to 128MB · c07fe1c5
      Noritaka Sekiyama authored
      Changed HDFS default block size from 64MB to 128MB.
      https://issues.apache.org/jira/browse/SPARK-18432
      
      
      
      Author: Noritaka Sekiyama <moomindani@gmail.com>
      
      Closes #15879 from moomindani/SPARK-18432.
      
      (cherry picked from commit 9d07ceee)
      Signed-off-by: default avatarKousuke Saruta <sarutak@oss.nttdata.co.jp>
      c07fe1c5
    • WangTaoTheTonic's avatar
      [SPARK-18396][HISTORYSERVER] Duration" column makes search result confused,... · 518dc1e1
      WangTaoTheTonic authored
      [SPARK-18396][HISTORYSERVER] Duration" column makes search result confused, maybe we should make it unsearchable
      
      ## What changes were proposed in this pull request?
      
      When we search data in History Server, it will check if any columns contains the search string. Duration is represented as long value in table, so if we search simple string like "003", "111", the duration containing "003", ‘111“ will be showed, which make not much sense to users.
      We cannot simply transfer the long value to meaning format like "1 h", "3.2 min" because they are also used for sorting. Better way to handle it is ban "Duration" columns from searching.
      
      ## How was this patch tested
      
      manually tests.
      
      Before("local-1478225166651" pass the filter because its duration in long value, which is "257244245" contains search string "244"):
      ![before](https://cloud.githubusercontent.com/assets/5276001/20203166/f851ffc6-a7ff-11e6-8fe6-91a90ca92b23.jpg)
      
      After:
      ![after](https://cloud.githubusercontent.com/assets/5276001/20178646/2129fbb0-a78d-11e6-9edb-39f885ce3ed0.jpg
      
      )
      
      Author: WangTaoTheTonic <wangtao111@huawei.com>
      
      Closes #15838 from WangTaoTheTonic/duration.
      
      (cherry picked from commit 637a0bb8)
      Signed-off-by: default avatarSean Owen <sowen@cloudera.com>
      Unverified
      518dc1e1
    • actuaryzhang's avatar
      [SPARK-18166][MLLIB] Fix Poisson GLM bug due to wrong requirement of response values · d554c02f
      actuaryzhang authored
      
      ## What changes were proposed in this pull request?
      
      The current implementation of Poisson GLM seems to allow only positive values. This is incorrect since the support of Poisson includes the origin. The bug is easily fixed by changing the test of the Poisson variable from  'require(y **>** 0.0' to  'require(y **>=** 0.0'.
      
      mengxr  srowen
      
      Author: actuaryzhang <actuaryzhang10@gmail.com>
      Author: actuaryzhang <actuaryzhang@uber.com>
      
      Closes #15683 from actuaryzhang/master.
      
      (cherry picked from commit ae6cddb7)
      Signed-off-by: default avatarSean Owen <sowen@cloudera.com>
      Unverified
      d554c02f
    • Sean Owen's avatar
      [SPARK-18382][WEBUI] "run at null:-1" in UI when no file/line info in call site info · 12bde11c
      Sean Owen authored
      
      ## What changes were proposed in this pull request?
      
      Avoid reporting null/-1 file / line number in call sites if encountering StackTraceElement without this info
      
      ## How was this patch tested?
      
      Existing tests
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #15862 from srowen/SPARK-18382.
      
      (cherry picked from commit f95b124c)
      Signed-off-by: default avatarKousuke Saruta <sarutak@oss.nttdata.co.jp>
      12bde11c
  4. Nov 13, 2016
  5. Nov 12, 2016
Loading