Skip to content
Snippets Groups Projects
  1. May 28, 2015
    • Xiangrui Meng's avatar
      [SPARK-7198] [MLLIB] VectorAssembler should output ML attributes · 7859ab65
      Xiangrui Meng authored
      `VectorAssembler` should carry over ML attributes. For unknown attributes, we assume numeric values. This PR handles the following cases:
      
      1. DoubleType with ML attribute: carry over
      2. DoubleType without ML attribute: numeric value
      3. Scalar type: numeric value
      4. VectorType with all ML attributes: carry over and update names
      5. VectorType with number of ML attributes: assume all numeric
      6. VectorType without ML attributes: check the first row and get the number of attributes
      
      jkbradley
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #6452 from mengxr/SPARK-7198 and squashes the following commits:
      
      a9d2469 [Xiangrui Meng] add space
      facdb1f [Xiangrui Meng] VectorAssembler should output ML attributes
      7859ab65
    • Mike Dusenberry's avatar
      [DOCS] Fixing broken "IDE setup" link in the Building Spark documentation. · 3e312a5e
      Mike Dusenberry authored
      The location of the IDE setup information has changed, so this just updates the link on the Building Spark page.
      
      Author: Mike Dusenberry <dusenberrymw@gmail.com>
      
      Closes #6467 from dusenberrymw/Fix_Broken_Link_On_Building_Spark_Doc and squashes the following commits:
      
      75c533a [Mike Dusenberry] Fixing broken "IDE setup" link in the Building Spark documentation by pointing to new location.
      3e312a5e
    • Li Yao's avatar
      [MINOR] Fix the a minor bug in PageRank Example. · c771589c
      Li Yao authored
      Fix the bug that entering only 1 arg will cause array out of bounds exception in PageRank example.
      
      Author: Li Yao <hnkfliyao@gmail.com>
      
      Closes #6455 from lastland/patch-1 and squashes the following commits:
      
      de06128 [Li Yao] Fix the bug that entering only 1 arg will cause array out of bounds exception.
      c771589c
    • Xiangrui Meng's avatar
      [SPARK-7911] [MLLIB] A workaround for VectorUDT serialize (or deserialize)... · 530efe3e
      Xiangrui Meng authored
      [SPARK-7911] [MLLIB] A workaround for VectorUDT serialize (or deserialize) being called multiple times
      
      ~~A PythonUDT shouldn't be serialized into external Scala types in PythonRDD. I'm not sure whether this should fix one of the bugs related to SQL UDT/UDF in PySpark.~~
      
      The fix above didn't work. So I added a workaround for this. If a Python UDF is applied to a Python UDT. This will put the Python SQL types as inputs. Still incorrect, but at least it doesn't throw exceptions on the Scala side. davies harsha2010
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #6442 from mengxr/SPARK-7903 and squashes the following commits:
      
      c257d2a [Xiangrui Meng] add a workaround for VectorUDT
      530efe3e
    • zsxwing's avatar
      [SPARK-7895] [STREAMING] [EXAMPLES] Move Kafka examples from scala-2.10/src to src · 000df2f0
      zsxwing authored
      Since `spark-streaming-kafka` now is published for both Scala 2.10 and 2.11, we can move `KafkaWordCount` and `DirectKafkaWordCount` from `examples/scala-2.10/src/` to `examples/src/` so that they will appear in `spark-examples-***-jar` for Scala 2.11.
      
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6436 from zsxwing/SPARK-7895 and squashes the following commits:
      
      c6052f1 [zsxwing] Update examples/pom.xml
      0bcfa87 [zsxwing] Fix the sleep time
      b9d1256 [zsxwing] Move Kafka examples from scala-2.10/src to src
      000df2f0
    • zuxqoj's avatar
      [SPARK-7782] fixed sort arrow issue · e838a25b
      zuxqoj authored
      Current behaviour::
      In spark UI
      ![screen shot 2015-05-27 at 3 27 51 pm](https://cloud.githubusercontent.com/assets/3919211/7837541/47d330ba-04a5-11e5-89d1-e5b11da1a513.png)
      
      In YARN
      ![screen shot 2015-05-27 at 3](https://cloud.githubusercontent.com/assets/3919211/7837594/aebd1d36-04a5-11e5-8216-86e03c07d2bd.png)
      
      In jira
      ![screen shot 2015-05-27 at 3_2](https://cloud.githubusercontent.com/assets/3919211/7837616/d3fedce2-04a5-11e5-9e68-960ed54e5d83.png)
      
      Author: zuxqoj <sbshekhar@gmail.com>
      
      Closes #6437 from zuxqoj/SPARK-7782_PR and squashes the following commits:
      
      cd068b9 [zuxqoj] [SPARK-7782] fixed sort arrow issue
      e838a25b
    • Matt Wise's avatar
      [DOCS] Fix typo in documentation for Java UDF registration · 35410614
      Matt Wise authored
      This contribution is my original work and I license the work to the project under the project's open source license
      
      Author: Matt Wise <mwise@quixey.com>
      
      Closes #6447 from wisematthew/fix-typo-in-java-udf-registration-doc and squashes the following commits:
      
      e7ef5f7 [Matt Wise] Fix typo in documentation for Java UDF registration
      35410614
    • Sandy Ryza's avatar
      [SPARK-7896] Allow ChainedBuffer to store more than 2 GB · bd11b01e
      Sandy Ryza authored
      Author: Sandy Ryza <sandy@cloudera.com>
      
      Closes #6440 from sryza/sandy-spark-7896 and squashes the following commits:
      
      49d8a0d [Sandy Ryza] Fix bug introduced when reading over record boundaries
      6006856 [Sandy Ryza] Fix overflow issues
      006b4b2 [Sandy Ryza] Fix scalastyle by removing non ascii characters
      8b000ca [Sandy Ryza] Add ascii art to describe layout of data in metaBuffer
      f2053c0 [Sandy Ryza] Fix negative overflow issue
      0368c78 [Sandy Ryza] Initialize size as 0
      a5a4820 [Sandy Ryza] Use explicit types for all numbers in ChainedBuffer
      b7e0213 [Sandy Ryza] SPARK-7896. Allow ChainedBuffer to store more than 2 GB
      bd11b01e
  2. May 27, 2015
    • Josh Rosen's avatar
      [SPARK-7873] Allow KryoSerializerInstance to create multiple streams at the same time · 852f4de2
      Josh Rosen authored
      This is a somewhat obscure bug, but I think that it will seriously impact KryoSerializer users who use custom registrators which disabled auto-reset. When auto-reset is disabled, then this breaks things in some of our shuffle paths which actually end up creating multiple OutputStreams from the same shared SerializerInstance (which is unsafe).
      
      This was introduced by a patch (SPARK-3386) which enables serializer re-use in some of the shuffle paths, since constructing new serializer instances is actually pretty costly for KryoSerializer.  We had already fixed another corner-case (SPARK-7766) bug related to this, but missed this one.
      
      I think that the root problem here is that KryoSerializerInstance can be used in a way which is unsafe even within a single thread, e.g. by creating multiple open OutputStreams from the same instance or by interleaving deserialize and deserializeStream calls. I considered a smaller patch which adds assertions to guard against this type of "misuse" but abandoned that approach after I realized how convoluted the Scaladoc became.
      
      This patch fixes this bug by making it legal to create multiple streams from the same KryoSerializerInstance.  Internally, KryoSerializerInstance now implements a  `borrowKryo()` / `releaseKryo()` API that's backed by a "pool" of capacity 1. Each call to a KryoSerializerInstance method will borrow the Kryo, do its work, then release the serializer instance back to the pool. If the pool is empty and we need an instance, it will allocate a new Kryo on-demand. This makes it safe for multiple OutputStreams to be opened from the same serializer. If we try to release a Kryo back to the pool but the pool already contains a Kryo, then we'll just discard the new Kryo. I don't think there's a clear benefit to having a larger pool since our usages tend to fall into two cases, a) where we only create a single OutputStream and b) where we create a huge number of OutputStreams with the same lifecycle, then destroy the KryoSerializerInstance (this is what's happening in the bypassMergeSort code path that my regression test hits).
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #6415 from JoshRosen/SPARK-7873 and squashes the following commits:
      
      00b402e [Josh Rosen] Initialize eagerly to fix a failing test
      ba55d20 [Josh Rosen] Add explanatory comments
      3f1da96 [Josh Rosen] Guard against duplicate close()
      ab457ca [Josh Rosen] Sketch a loan/release based solution.
      9816e8f [Josh Rosen] Add a failing test showing how deserialize() and deserializeStream() can interfere.
      7350886 [Josh Rosen] Add failing regression test for SPARK-7873
      852f4de2
    • Yin Huai's avatar
      [SPARK-7907] [SQL] [UI] Rename tab ThriftServer to SQL. · 3c1f1baa
      Yin Huai authored
      This PR has three changes:
      1. Renaming the table of `ThriftServer` to `SQL`;
      2. Renaming the title of the tab from `ThriftServer` to `JDBC/ODBC Server`; and
      3. Renaming the title of the session page from `ThriftServer` to `JDBC/ODBC Session`.
      
      https://issues.apache.org/jira/browse/SPARK-7907
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #6448 from yhuai/JDBCServer and squashes the following commits:
      
      eadcc3d [Yin Huai] Update test.
      9168005 [Yin Huai] Use SQL as the tab name.
      221831e [Yin Huai] Rename ThriftServer to JDBCServer.
      3c1f1baa
    • Liang-Chi Hsieh's avatar
      [SPARK-7897][SQL] Use DecimalType to represent unsigned bigint in JDBCRDD · a1e092ea
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-7897
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #6438 from viirya/jdbc_unsigned_bigint and squashes the following commits:
      
      ccb3c3f [Liang-Chi Hsieh] Use DecimalType to represent unsigned bigint.
      a1e092ea
    • Cheng Hao's avatar
      [SPARK-7853] [SQL] Fixes a class loader issue in Spark SQL · db3fd054
      Cheng Hao authored
      This PR is based on PR #6396 authored by chenghao-intel. Essentially, Spark SQL should use context classloader to load SerDe classes.
      
      yhuai helped updating the test case, and I fixed a bug in the original `CliSuite`: while testing the CLI tool with `runCliWithin`, we don't append `\n` to the last query, thus the last query is never executed.
      
      Original PR description is pasted below.
      
      ----
      
      ```
      bin/spark-sql --jars ./sql/hive/src/test/resources/hive-hcatalog-core-0.13.1.jar
      CREATE TABLE t1(a string, b string) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe';
      ```
      
      Throws exception like
      
      ```
      15/05/26 00:16:33 ERROR SparkSQLDriver: Failed in [CREATE TABLE t1(a string, b string) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe']
      org.apache.spark.sql.execution.QueryExecutionException: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot validate serde: org.apache.hive.hcatalog.data.JsonSerDe
              at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:333)
              at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:310)
              at org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:139)
              at org.apache.spark.sql.hive.client.ClientWrapper.runHive(ClientWrapper.scala:310)
              at org.apache.spark.sql.hive.client.ClientWrapper.runSqlHive(ClientWrapper.scala:300)
              at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:457)
              at org.apache.spark.sql.hive.execution.HiveNativeCommand.run(HiveNativeCommand.scala:33)
              at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57)
              at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57)
              at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:68)
              at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
              at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
              at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
              at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:87)
              at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:922)
              at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:922)
              at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:147)
              at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:131)
              at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
              at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:727)
              at org.apache.spark.sql.hive.thriftserver.AbstractSparkSQLDriver.run(AbstractSparkSQLDriver.scala:57)
      ```
      
      Author: Cheng Hao <hao.cheng@intel.com>
      Author: Cheng Lian <lian@databricks.com>
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #6435 from liancheng/classLoader and squashes the following commits:
      
      d4c4845 [Cheng Lian] Fixes CliSuite
      75e80e2 [Yin Huai] Update the fix.
      fd26533 [Cheng Hao] scalastyle
      dd78775 [Cheng Hao] workaround for classloader of IsolatedClientLoader
      db3fd054
    • Cheng Lian's avatar
      [SPARK-7684] [SQL] Refactoring MetastoreDataSourcesSuite to workaround SPARK-7684 · b97ddff0
      Cheng Lian authored
      As stated in SPARK-7684, currently `TestHive.reset` has some execution order specific bug, which makes running specific test suites locally pretty frustrating. This PR refactors `MetastoreDataSourcesSuite` (which relies on `TestHive.reset` heavily) using various `withXxx` utility methods in `SQLTestUtils` to ask each test case to cleanup their own mess so that we can avoid calling `TestHive.reset`.
      
      Author: Cheng Lian <lian@databricks.com>
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #6353 from liancheng/workaround-spark-7684 and squashes the following commits:
      
      26939aa [Yin Huai] Move the initialization of jsonFilePath to beforeAll.
      a423d48 [Cheng Lian] Fixes Scala style issue
      dfe45d0 [Cheng Lian] Refactors MetastoreDataSourcesSuite to workaround SPARK-7684
      92a116d [Cheng Lian] Fixes minor styling issues
      b97ddff0
    • Daoyuan Wang's avatar
      [SPARK-7790] [SQL] date and decimal conversion for dynamic partition key · 8161562e
      Daoyuan Wang authored
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      
      Closes #6318 from adrian-wang/dynpart and squashes the following commits:
      
      ad73b61 [Daoyuan Wang] not use sqlTestUtils for try catch because dont have sqlcontext here
      6c33b51 [Daoyuan Wang] fix according to liancheng
      f0f8074 [Daoyuan Wang] some specific types as dynamic partition
      8161562e
    • Reynold Xin's avatar
      Removed Guava dependency from JavaTypeInference's type signature. · 6fec1a94
      Reynold Xin authored
      This should also close #6243.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6431 from rxin/JavaTypeInference-guava and squashes the following commits:
      
      e58df3c [Reynold Xin] Removed Gauva dependency from JavaTypeInference's type signature.
      6fec1a94
    • Kousuke Saruta's avatar
      [SPARK-7864] [UI] Fix the logic grabbing the link from table in AllJobPage · 0db76c90
      Kousuke Saruta authored
      This issue is related to #6419 .
      Now AllJobPage doesn't have a "kill link" but I think fix the issue mentioned in #6419 just in case to avoid accidents in the future.
      
      So, it's minor issue for now and I don't file this issue in JIRA.
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #6432 from sarutak/remove-ambiguity-of-link and squashes the following commits:
      
      cd1a503 [Kousuke Saruta] Fixed ambiguity link issue in AllJobPage
      0db76c90
    • Cheng Lian's avatar
      [SPARK-7847] [SQL] Fixes dynamic partition directory escaping · 15459db4
      Cheng Lian authored
      Please refer to [SPARK-7847] [1] for details.
      
      [1]: https://issues.apache.org/jira/browse/SPARK-7847
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #6389 from liancheng/spark-7847 and squashes the following commits:
      
      935c652 [Cheng Lian] Adds test case for writing various data types as dynamic partition value
      f4fc398 [Cheng Lian] Converts partition columns to Scala type when writing dynamic partitions
      d0aeca0 [Cheng Lian] Fixes dynamic partition directory escaping
      15459db4
    • Kay Ousterhout's avatar
      [SPARK-7878] Rename Stage.jobId to firstJobId · ff0ddff4
      Kay Ousterhout authored
      The previous name was confusing, because each stage can be associated with
      many jobs, and jobId is just the ID of the first job that was associated
      with the Stage. This commit also renames some of the method parameters in
      DAGScheduler.scala to clarify when the jobId refers to the first job ID
      associated with the stage (as opposed to the jobId associated with a job
      that's currently being scheduled).
      
      cc markhamstra JoshRosen (hopefully this will help prevent future bugs like SPARK-6880)
      
      Author: Kay Ousterhout <kayousterhout@gmail.com>
      
      Closes #6418 from kayousterhout/SPARK-7878 and squashes the following commits:
      
      b71a9b8 [Kay Ousterhout] [SPARK-7878] Rename Stage.jobId to firstJobId
      ff0ddff4
    • scwf's avatar
      [CORE] [TEST] HistoryServerSuite failed due to timezone issue · 4615081d
      scwf authored
      follow up for #6377
      Change time to the equivalent in GMT
      /cc squito
      
      Author: scwf <wangfei1@huawei.com>
      
      Closes #6425 from scwf/fix-HistoryServerSuite and squashes the following commits:
      
      4d37935 [scwf] fix HistoryServerSuite
      4615081d
    • Reynold Xin's avatar
      [SQL] Rename MathematicalExpression UnaryMathExpression, and specify... · 3e7d7d6b
      Reynold Xin authored
      [SQL] Rename MathematicalExpression UnaryMathExpression, and specify BinaryMathExpression's output data type as DoubleType.
      
      Two minor changes.
      
      cc brkyvz
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6428 from rxin/math-func-cleanup and squashes the following commits:
      
      5910df5 [Reynold Xin] [SQL] Rename MathematicalExpression UnaryMathExpression, and specify BinaryMathExpression's output data type as DoubleType.
      3e7d7d6b
    • Reynold Xin's avatar
      [SPARK-7887][SQL] Remove EvaluatedType from SQL Expression. · 9f48bf6b
      Reynold Xin authored
      This type is not really used. Might as well remove it.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #6427 from rxin/evalutedType and squashes the following commits:
      
      51a319a [Reynold Xin] [SPARK-7887][SQL] Remove EvaluatedType from SQL Expression.
      9f48bf6b
    • Liang-Chi Hsieh's avatar
      [SPARK-7697][SQL] Use LongType for unsigned int in JDBCRDD · 4f98d7a7
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-7697
      
      The reported problem case is mysql. But for h2 db, there is no unsigned int. So it is not able to add corresponding test.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #6229 from viirya/unsignedint_as_long and squashes the following commits:
      
      dc4b5d8 [Liang-Chi Hsieh] Merge remote-tracking branch 'upstream/master' into unsignedint_as_long
      608695b [Liang-Chi Hsieh] Use LongType for unsigned int in JDBCRDD.
      4f98d7a7
    • Cheolsoo Park's avatar
      [SPARK-7850][BUILD] Hive 0.12.0 profile in POM should be removed · 6dd64587
      Cheolsoo Park authored
      I grep'ed hive-0.12.0 in the source code and removed all the profiles and doc references.
      
      Author: Cheolsoo Park <cheolsoop@netflix.com>
      
      Closes #6393 from piaozhexiu/SPARK-7850 and squashes the following commits:
      
      fb429ce [Cheolsoo Park] Remove hive-0.13.1 profile
      82bf09a [Cheolsoo Park] Remove hive 0.12.0 shim code
      f3722da [Cheolsoo Park] Remove hive-0.12.0 profile and references from POM and build docs
      6dd64587
    • Xiangrui Meng's avatar
      [SPARK-7535] [.1] [MLLIB] minor changes to the pipeline API · a9f1c0c5
      Xiangrui Meng authored
      1. removed `Params.validateParams(extra)`
      2. added `Evaluate.evaluate(dataset, paramPairs*)`
      3. updated `RegressionEvaluator` doc
      
      jkbradley
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #6392 from mengxr/SPARK-7535.1 and squashes the following commits:
      
      5ff5af8 [Xiangrui Meng] add unit test for CV.validateParams
      f1f8369 [Xiangrui Meng] update CV.validateParams() to test estimatorParamMaps
      607445d [Xiangrui Meng] merge master
      8716f5f [Xiangrui Meng] specify default metric name in RegressionEvaluator
      e4e5631 [Xiangrui Meng] update RegressionEvaluator doc
      801e864 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7535.1
      fcbd3e2 [Xiangrui Meng] Merge branch 'master' into SPARK-7535.1
      2192316 [Xiangrui Meng] remove validateParams(extra); add evaluate(dataset, extra*)
      a9f1c0c5
  3. May 26, 2015
    • Cheng Lian's avatar
      [SPARK-7868] [SQL] Ignores _temporary directories in HadoopFsRelation · b463e6d6
      Cheng Lian authored
      So that potential partial/corrupted data files left by failed tasks/jobs won't affect normal data scan.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #6411 from liancheng/spark-7868 and squashes the following commits:
      
      273ea36 [Cheng Lian] Ignores _temporary directories
      b463e6d6
    • Josh Rosen's avatar
      [SPARK-7858] [SQL] Use output schema, not relation schema, for data source input conversion · 0c33c7b4
      Josh Rosen authored
      In `DataSourceStrategy.createPhysicalRDD`, we use the relation schema as the target schema for converting incoming rows into Catalyst rows.  However, we should be using the output schema instead, since our scan might return a subset of the relation's columns.
      
      This patch incorporates #6414 by liancheng, which fixes an issue in `SimpleTestRelation` that prevented this bug from being caught by our old tests:
      
      > In `SimpleTextRelation`, we specified `needsConversion` to `true`, indicating that values produced by this testing relation should be of Scala types, and need to be converted to Catalyst types when necessary. However, we also used `Cast` to convert strings to expected data types. And `Cast` always produces values of Catalyst types, thus no conversion is done at all. This PR makes `SimpleTextRelation` produce Scala values so that data conversion code paths can be properly tested.
      
      Closes #5986.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      Author: Cheng Lian <lian@databricks.com>
      Author: Cheng Lian <liancheng@users.noreply.github.com>
      
      Closes #6400 from JoshRosen/SPARK-7858 and squashes the following commits:
      
      e71c866 [Josh Rosen] Re-fix bug so that the tests pass again
      56b13e5 [Josh Rosen] Add regression test to hadoopFsRelationSuites
      2169a0f [Josh Rosen] Remove use of SpecificMutableRow and BufferedIterator
      6cd7366 [Josh Rosen] Fix SPARK-7858 by using output types for conversion.
      5a00e66 [Josh Rosen] Add assertions in order to reproduce SPARK-7858
      8ba195c [Cheng Lian] Merge 9968fba9979287aaa1f141ba18bfb9d4c116a3b3 into 61664732
      9968fba [Cheng Lian] Tests the data type conversion code paths
      0c33c7b4
    • rowan's avatar
      [SPARK-7637] [SQL] O(N) merge implementation for StructType merge · 03668348
      rowan authored
      Contribution is my original work and I license the work to the project under the projects open source license.
      
      Author: rowan <rowan.chattaway@googlemail.com>
      
      Closes #6259 from rowan000/SPARK-7637 and squashes the following commits:
      
      c479df4 [rowan] SPARK-7637: rename mapFields to fieldsMap as per comments on github.
      8d2e419 [rowan] SPARK-7637: fix up whitespace changes
      0e9d662 [rowan] SPARK-7637: O(N) merge implementatio for StructType merge
      03668348
    • Mike Dusenberry's avatar
      [SPARK-7883] [DOCS] [MLLIB] Fixing broken trainImplicit Scala example in MLlib... · 0463428b
      Mike Dusenberry authored
      [SPARK-7883] [DOCS] [MLLIB] Fixing broken trainImplicit Scala example in MLlib Collaborative Filtering documentation.
      
      Fixing broken trainImplicit Scala example in MLlib Collaborative Filtering documentation to match one of the possible ALS.trainImplicit function signatures.
      
      Author: Mike Dusenberry <dusenberrymw@gmail.com>
      
      Closes #6422 from dusenberrymw/Fix_MLlib_Collab_Filtering_trainImplicit_Example and squashes the following commits:
      
      36492f4 [Mike Dusenberry] Fixing broken trainImplicit example in MLlib Collaborative Filtering documentation to match one of the possible ALS.trainImplicit function signatures.
      0463428b
    • Andrew Or's avatar
      [SPARK-7864] [UI] Do not kill innocent stages from visualization · 8f208242
      Andrew Or authored
      **Reproduction.** Run a long-running job, go to the job page, expand the DAG visualization, and click into a stage. Your stage is now killed. Why? This is because the visualization code just reaches into the stage table and grabs the first link it finds. In our case, this first link happens to be the kill link instead of the one to the stage page.
      
      **Fix.** Use proper CSS selectors to avoid ambiguity.
      
      This is an alternative to #6407. Thanks carsonwang for catching this.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #6419 from andrewor14/fix-ui-viz-kill and squashes the following commits:
      
      25203bd [Andrew Or] Do not kill innocent stages
      8f208242
    • Xiangrui Meng's avatar
      [SPARK-7748] [MLLIB] Graduate spark.ml from alpha · 836a7589
      Xiangrui Meng authored
      With descent coverage of feature transformers, algorithms, and model tuning support, it is time to graduate `spark.ml` from alpha. This PR changes all `AlphaComponent` annotations to either `DeveloperApi` or `Experimental`, depending on whether we expect a class/method to be used by end users (who use the pipeline API to assemble/tune their ML pipelines but not to create new pipeline components.) `UnaryTransformer` becomes a `DeveloperApi` in this PR.
      
      jkbradley harsha2010
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #6417 from mengxr/SPARK-7748 and squashes the following commits:
      
      effbccd [Xiangrui Meng] organize imports
      c15028e [Xiangrui Meng] added missing docs
      1b2e5f8 [Xiangrui Meng] update package doc
      73ca791 [Xiangrui Meng] alpha -> ex/dev for the rest
      93819db [Xiangrui Meng] alpha -> ex/dev in ml.param
      55ca073 [Xiangrui Meng] alpha -> ex/dev in ml.feature
      83572f1 [Xiangrui Meng] add Experimental and DeveloperApi tags (wip)
      836a7589
    • zsxwing's avatar
      [SPARK-6602] [CORE] Remove some places in core that calling SparkEnv.actorSystem · 9f742241
      zsxwing authored
      Author: zsxwing <zsxwing@gmail.com>
      
      Closes #6333 from zsxwing/remove-actor-system-usage and squashes the following commits:
      
      f125aa6 [zsxwing] Fix YarnAllocatorSuite
      ceadcf6 [zsxwing] Change the "port" parameter type of "AkkaUtils.address" to "int"; update ApplicationMaster and YarnAllocator to get the driverUrl from RpcEnv
      3239380 [zsxwing] Remove some places in core that calling SparkEnv.actorSystem
      9f742241
    • Shivaram Venkataraman's avatar
      [SPARK-3674] YARN support in Spark EC2 · 2e9a5f22
      Shivaram Venkataraman authored
      This corresponds to https://github.com/mesos/spark-ec2/pull/116 in the spark-ec2 repo. The only changes required on the spark_ec2.py script is to open the RM port.
      
      cc andrewor14
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #6376 from shivaram/spark-ec2-yarn and squashes the following commits:
      
      961504a [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into spark-ec2-yarn
      152c94c [Shivaram Venkataraman] Open 8088 for YARN in EC2
      2e9a5f22
    • MechCoder's avatar
      [SPARK-7844] [MLLIB] Fix broken tests in KernelDensity · 61664732
      MechCoder authored
      The densities in KernelDensity are scaled down by
      (number of parallel processes X number of points). It should be just no.of samples. This results in broken tests in KernelDensitySuite which haven't been tested properly.
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #6383 from MechCoder/spark-7844 and squashes the following commits:
      
      ab81302 [MechCoder] Math->math
      9b8ed50 [MechCoder] Make one pass to update count
      a92fe50 [MechCoder] [SPARK-7844] Fix broken tests in KernelDensity
      61664732
    • Patrick Wendell's avatar
    • Zhang, Liye's avatar
      [SPARK-7854] [TEST] refine Kryo test suite · 63099122
      Zhang, Liye authored
      this modification is according to JoshRosen 's comments, for details, please refer to [#5934](https://github.com/apache/spark/pull/5934/files#r30949751).
      
      Author: Zhang, Liye <liye.zhang@intel.com>
      
      Closes #6395 from liyezhang556520/kryoTest and squashes the following commits:
      
      da214c8 [Zhang, Liye] refine Kryo test suite accroding to Josh's comments
      63099122
    • Mike Dusenberry's avatar
      [DOCS] [MLLIB] Fixing misformatted links in v1.4 MLlib Naive Bayes... · e5a63a0e
      Mike Dusenberry authored
      [DOCS] [MLLIB] Fixing misformatted links in v1.4 MLlib Naive Bayes documentation by removing space and newline characters.
      
      A couple of links in the MLlib Naive Bayes documentation for v1.4 were broken due to the addition of either space or newline characters between the link title and link URL in the markdown doc.  (Interestingly enough, they are rendered correctly in the GitHub viewer, but not when compiled to HTML by Jekyll.)
      
      Author: Mike Dusenberry <dusenberrymw@gmail.com>
      
      Closes #6412 from dusenberrymw/Fix_Broken_Links_In_MLlib_Naive_Bayes_Docs and squashes the following commits:
      
      91a4028 [Mike Dusenberry] Fixing misformatted links by removing space and newline characters.
      e5a63a0e
    • meawoppl's avatar
      [SPARK-7806][EC2] Fixes that allow the spark_ec2.py tool to run with Python3 · 8dbe7777
      meawoppl authored
      I have used this script to launch, destroy, start, and stop clusters successfully.
      
      Author: meawoppl <meawoppl@gmail.com>
      
      Closes #6336 from meawoppl/py3ec2spark and squashes the following commits:
      
      2e87046 [meawoppl] Py3 compat fixes.
      8dbe7777
    • linweizhong's avatar
      [SPARK-7339] [PYSPARK] PySpark shuffle spill memory sometimes are not correct · 8948ad3f
      linweizhong authored
      In PySpark we get memory used before and after spill, then use the difference of these two value as memorySpilled, but if the before value is small than after value, then we will get a negative value, but this scenario 0 value may be more reasonable.
      
      Below is the result in HistoryServer we have tested:
      Index	ID	Attempt	Status	Locality Level	Executor ID / Host	Launch Time	Duration	GC Time	Input Size / Records	Write Time	Shuffle Write Size / Records	Shuffle Spill (Memory)	Shuffle Spill (Disk)	Errors
      0	0	0	SUCCESS	NODE_LOCAL	3 / vm119	2015/05/04 17:31:06	21 s	0.1 s	128.1 MB (hadoop) / 3237	70 ms	10.1 MB / 2529	0.0 B	5.7 MB
      2	2	0	SUCCESS	NODE_LOCAL	1 / vm118	2015/05/04 17:31:06	22 s	89 ms	128.1 MB (hadoop) / 3205	0.1 s	10.1 MB / 2529	-1048576.0 B	5.9 MB
      1	1	0	SUCCESS	NODE_LOCAL	2 / vm117	2015/05/04 17:31:06	22 s	0.1 s	128.1 MB (hadoop) / 3271	68 ms	10.1 MB / 2529	-1048576.0 B	5.6 MB
      4	4	0	SUCCESS	NODE_LOCAL	2 / vm117	2015/05/04 17:31:06	22 s	0.1 s	128.1 MB (hadoop) / 3192	51 ms	10.1 MB / 2529	-1048576.0 B	5.9 MB
      3	3	0	SUCCESS	NODE_LOCAL	3 / vm119	2015/05/04 17:31:06	22 s	0.1 s	128.1 MB (hadoop) / 3262	51 ms	10.1 MB / 2529	1024.0 KB	5.8 MB
      5	5	0	SUCCESS	NODE_LOCAL	1 / vm118	2015/05/04 17:31:06	22 s	89 ms	128.1 MB (hadoop) / 3256	93 ms	10.1 MB / 2529	-1048576.0 B	5.7 MB
      
      /cc davies
      
      Author: linweizhong <linweizhong@huawei.com>
      
      Closes #5887 from Sephiroth-Lin/spark-7339 and squashes the following commits:
      
      9186c81 [linweizhong] Use max function to get a nonnegative value
      d41672b [linweizhong] Update MemoryBytesSpilled when memorySpilled > 0
      8948ad3f
    • scwf's avatar
      [CORE] [TEST] Fix SimpleDateParamTest · bf49c221
      scwf authored
      ```
      sbt.ForkMain$ForkError: 1424424077190 was not equal to 1424474477190
      	at org.scalatest.MatchersHelper$.newTestFailedException(MatchersHelper.scala:160)
      	at org.scalatest.Matchers$ShouldMethodHelper$.shouldMatcher(Matchers.scala:6231)
      	at org.scalatest.Matchers$AnyShouldWrapper.should(Matchers.scala:6265)
      	at org.apache.spark.status.api.v1.SimpleDateParamTest$$anonfun$1.apply$mcV$sp(SimpleDateParamTest.scala:25)
      	at org.apache.spark.status.api.v1.SimpleDateParamTest$$anonfun$1.apply(SimpleDateParamTest.scala:23)
      	at org.apache.spark.status.api.v1.SimpleDateParamTest$$anonfun$1.apply(SimpleDateParamTest.scala:23)
      	at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
      	at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
      	at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
      	at org.scalatest.Transformer.apply(Transformer.scala:22)
      	at org.scalatest.Transformer.apply(Transformer.scala:20)
      	at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
      	at org.scalatest.Suite$class.withFixture(Suite.scala:
      ```
      
      Set timezone to fix SimpleDateParamTest
      
      Author: scwf <wangfei1@huawei.com>
      Author: Fei Wang <wangfei1@huawei.com>
      
      Closes #6377 from scwf/fix-SimpleDateParamTest and squashes the following commits:
      
      b8df1e5 [Fei Wang] Update SimpleDateParamSuite.scala
      8bb74f0 [scwf] fix SimpleDateParamSuite
      bf49c221
    • Konstantin Shaposhnikov's avatar
      [SPARK-7042] [BUILD] use the standard akka artifacts with hadoop-2.x · 43aa819c
      Konstantin Shaposhnikov authored
      Both akka 2.3.x and hadoop-2.x use protobuf 2.5 so only hadoop-1 build needs
      custom 2.3.4-spark akka version that shades protobuf-2.5
      
      This partially fixes SPARK-7042 (for hadoop-2.x builds)
      
      Author: Konstantin Shaposhnikov <Konstantin.Shaposhnikov@sc.com>
      
      Closes #6341 from kostya-sh/SPARK-7042 and squashes the following commits:
      
      7eb8c60 [Konstantin Shaposhnikov] [SPARK-7042][BUILD] use the standard akka artifacts with hadoop-2.x
      43aa819c
Loading