Skip to content
Snippets Groups Projects
  1. Nov 17, 2015
    • Timothy Hunter's avatar
      [SPARK-11732] Removes some MiMa false positives · fa603e08
      Timothy Hunter authored
      This adds an extra filter for private or protected classes. We only filter for package private right now.
      
      Author: Timothy Hunter <timhunter@databricks.com>
      
      Closes #9697 from thunterdb/spark-11732.
      fa603e08
    • Davies Liu's avatar
      [SPARK-11767] [SQL] limit the size of caced batch · 5aca6ad0
      Davies Liu authored
      Currently the size of cached batch in only controlled by `batchSize` (default value is 10000), which does not work well with the size of serialized columns (for example, complex types). The memory used to build the batch is not accounted, it's easy to OOM (especially after unified memory management).
      
      This PR introduce a hard limit as 4M for total columns (up to 50 columns of uncompressed primitive columns).
      
      This also change the way to grow buffer, double it each time, then trim it once finished.
      
      cc liancheng
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #9760 from davies/cache_limit.
      5aca6ad0
    • Joseph K. Bradley's avatar
      [SPARK-11769][ML] Add save, load to all basic Transformers · d98d1cb0
      Joseph K. Bradley authored
      This excludes Estimators and ones which include Vector and other non-basic types for Params or data.  This adds:
      * Bucketizer
      * DCT
      * HashingTF
      * Interaction
      * NGram
      * Normalizer
      * OneHotEncoder
      * PolynomialExpansion
      * QuantileDiscretizer
      * RFormula
      * SQLTransformer
      * StopWordsRemover
      * StringIndexer
      * Tokenizer
      * VectorAssembler
      * VectorSlicer
      
      CC: mengxr
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #9755 from jkbradley/transformer-io.
      d98d1cb0
    • Wenchen Fan's avatar
      [SPARK-10186][SQL] support postgre array type in JDBCRDD · d9251496
      Wenchen Fan authored
      Add ARRAY support to `PostgresDialect`.
      
      Nested ARRAY is not allowed for now because it's hard to get the array dimension info. See http://stackoverflow.com/questions/16619113/how-to-get-array-base-type-in-postgres-via-jdbc
      
      Thanks for the initial work from mariusvniekerk !
      
      Close https://github.com/apache/spark/pull/9137
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #9662 from cloud-fan/postgre.
      d9251496
    • gatorsmile's avatar
      [SPARK-8658][SQL][FOLLOW-UP] AttributeReference's equals method compares all the members · 0158ff77
      gatorsmile authored
      Based on the comment of cloud-fan in https://github.com/apache/spark/pull/9216, update the AttributeReference's hashCode function by including the hashCode of the other attributes including name, nullable and qualifiers.
      
      Here, I am not 100% sure if we should include name in the hashCode calculation, since the original hashCode calculation does not include it.
      
      marmbrus cloud-fan Please review if the changes are good.
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #9761 from gatorsmile/hashCodeNamedExpression.
      0158ff77
    • Cheng Lian's avatar
      [SPARK-11089][SQL] Adds option for disabling multi-session in Thrift server · 7b1407c7
      Cheng Lian authored
      This PR adds a new option `spark.sql.hive.thriftServer.singleSession` for disabling multi-session support in the Thrift server.
      
      Note that this option is added as a Spark configuration (retrieved from `SparkConf`) rather than Spark SQL configuration (retrieved from `SQLConf`). This is because all SQL configurations are session-ized. Since multi-session support is by default on, no JDBC connection can modify global configurations like the newly added one.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #9740 from liancheng/spark-11089.single-session-option.
      7b1407c7
    • mayuanwen's avatar
      [SPARK-11679][SQL] Invoking method " apply(fields:... · e8833dd1
      mayuanwen authored
      [SPARK-11679][SQL] Invoking method " apply(fields: java.util.List[StructField])" in "StructType" gets ClassCastException
      
      In the previous method, fields.toArray will cast java.util.List[StructField] into Array[Object] which can not cast into Array[StructField], thus when invoking this method will throw "java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to [Lorg.apache.spark.sql.types.StructField;"
      I directly cast java.util.List[StructField] into Array[StructField]  in this patch.
      
      Author: mayuanwen <mayuanwen@qiyi.com>
      
      Closes #9649 from jackieMaKing/Spark-11679.
      e8833dd1
    • Xiangrui Meng's avatar
      [SPARK-11766][MLLIB] add toJson/fromJson to Vector/Vectors · 21fac543
      Xiangrui Meng authored
      This is to support JSON serialization of Param[Vector] in the pipeline API. It could be used for other purposes too. The schema is the same as `VectorUDT`. jkbradley
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #9751 from mengxr/SPARK-11766.
      21fac543
    • Chris Bannister's avatar
      [SPARK-11695][CORE] Set s3a credentials · cc567b66
      Chris Bannister authored
      Set s3a credentials when creating a new default hadoop configuration.
      
      Author: Chris Bannister <chris.bannister@swiftkey.com>
      
      Closes #9663 from Zariel/set-s3a-creds.
      cc567b66
    • jerryshao's avatar
      [SPARK-11744][LAUNCHER] Fix print version throw exception when using pyspark shell · 6fc2740e
      jerryshao authored
      Exception details can be seen here (https://issues.apache.org/jira/browse/SPARK-11744).
      
      Author: jerryshao <sshao@hortonworks.com>
      
      Closes #9721 from jerryshao/SPARK-11744.
      6fc2740e
    • Philipp Hoffmann's avatar
      [SPARK-11779][DOCS] Fix reference to deprecated MESOS_NATIVE_LIBRARY · 15cc36b7
      Philipp Hoffmann authored
      MESOS_NATIVE_LIBRARY was renamed in favor of MESOS_NATIVE_JAVA_LIBRARY. This commit fixes the reference in the documentation.
      
      Author: Philipp Hoffmann <mail@philipphoffmann.de>
      
      Closes #9768 from philipphoffmann/patch-2.
      15cc36b7
    • yangping.wu's avatar
      [SPARK-11751] Doc describe error in the "Spark Streaming Programming Guide" page · 7276fa9a
      yangping.wu authored
      In the **[Task Launching Overheads](http://spark.apache.org/docs/latest/streaming-programming-guide.html#task-launching-overheads)** section,
      >Task Serialization: Using Kryo serialization for serializing tasks can reduce the task sizes, and therefore reduce the time taken to send them to the slaves.
      
      as we known **Task Serialization** is configuration by **spark.closure.serializer** parameter, but currently only the Java serializer is supported. If we set **spark.closure.serializer** to **org.apache.spark.serializer.KryoSerializer**, then this will throw a exception.
      
      Author: yangping.wu <wyphao.2007@163.com>
      
      Closes #9734 from 397090770/397090770-patch-1.
      7276fa9a
    • Cheng Lian's avatar
      [SPARK-11191][SQL][FOLLOW-UP] Cleans up unnecessary anonymous HiveFunctionRegistry · fa13301a
      Cheng Lian authored
      According to discussion in PR #9664, the anonymous `HiveFunctionRegistry` in `HiveContext` can be removed now.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #9737 from liancheng/spark-11191.follow-up.
      fa13301a
    • Liang-Chi Hsieh's avatar
      [MINOR] [SQL] Fix randomly generated ArrayData in RowEncoderSuite · d79d8b08
      Liang-Chi Hsieh authored
      The randomly generated ArrayData used for the UDT `ExamplePoint` in `RowEncoderSuite` sometimes doesn't have enough elements. In this case, this test will fail. This patch is to fix it.
      
      Author: Liang-Chi Hsieh <viirya@appier.com>
      
      Closes #9757 from viirya/fix-randomgenerated-udt.
      d79d8b08
    • Kevin Yu's avatar
      [SPARK-11447][SQL] change NullType to StringType during binaryComparison... · e01865af
      Kevin Yu authored
      [SPARK-11447][SQL] change NullType to StringType during binaryComparison between NullType and StringType
      
      During executing PromoteStrings rule, if one side of binaryComparison is StringType and the other side is not StringType, the current code will promote(cast) the StringType to DoubleType, and if the StringType doesn't contain the numbers, it will get null value. So if it is doing <=> (NULL-safe equal) with Null, it will not filter anything, caused the problem reported by this jira.
      
      I proposal to the changes through this PR, can you review my code changes ?
      
      This problem only happen for <=>, other operators works fine.
      
      scala> val filteredDF = df.filter(df("column") > (new Column(Literal(null))))
      filteredDF: org.apache.spark.sql.DataFrame = [column: string]
      
      scala> filteredDF.show
      +------+
      |column|
      +------+
      +------+
      
      scala> val filteredDF = df.filter(df("column") === (new Column(Literal(null))))
      filteredDF: org.apache.spark.sql.DataFrame = [column: string]
      
      scala> filteredDF.show
      +------+
      |column|
      +------+
      +------+
      
      scala> df.registerTempTable("DF")
      
      scala> sqlContext.sql("select * from DF where 'column' = NULL")
      res27: org.apache.spark.sql.DataFrame = [column: string]
      
      scala> res27.show
      +------+
      |column|
      +------+
      +------+
      
      Author: Kevin Yu <qyu@us.ibm.com>
      
      Closes #9720 from kevinyu98/working_on_spark-11447.
      e01865af
    • hyukjinkwon's avatar
      [SPARK-11694][FOLLOW-UP] Clean up imports, use a common function for metadata... · 75d20207
      hyukjinkwon authored
      [SPARK-11694][FOLLOW-UP] Clean up imports, use a common function for metadata and add a test for FIXED_LEN_BYTE_ARRAY
      
      As discussed https://github.com/apache/spark/pull/9660 https://github.com/apache/spark/pull/9060, I cleaned up unused imports, added a test for fixed-length byte array and used a common function for writing metadata for Parquet.
      
      For the test for fixed-length byte array, I have tested and checked the encoding types with [parquet-tools](https://github.com/Parquet/parquet-mr/tree/master/parquet-tools).
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #9754 from HyukjinKwon/SPARK-11694-followup.
      75d20207
  2. Nov 16, 2015
Loading