Skip to content
Snippets Groups Projects
  1. Dec 31, 2015
    • Marcelo Vanzin's avatar
      [SPARK-3873][STREAMING] Import order fixes for streaming. · efb10cc9
      Marcelo Vanzin authored
      Also included a few miscelaneous other modules that had very few violations.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #10532 from vanzin/SPARK-3873-streaming.
      efb10cc9
    • Yin Huai's avatar
      [SPARK-12039][SQL] Re-enable HiveSparkSubmitSuite's SPARK-9757 Persist Parquet... · 5cdecb18
      Yin Huai authored
      [SPARK-12039][SQL] Re-enable HiveSparkSubmitSuite's SPARK-9757 Persist Parquet relation with decimal column
      
      https://issues.apache.org/jira/browse/SPARK-12039
      
      since we do not support hadoop1, we can re-enable this test in master.
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #10533 from yhuai/SPARK-12039-enable.
      5cdecb18
    • Shixiong Zhu's avatar
      [SPARK-7995][SPARK-6280][CORE] Remove AkkaRpcEnv and remove systemName from setupEndpointRef · 4f5a24d7
      Shixiong Zhu authored
      ### Remove AkkaRpcEnv
      
      Keep `SparkEnv.actorSystem` because Streaming still uses it. Will remove it and AkkaUtils after refactoring Streaming actorStream API.
      
      ### Remove systemName
      There are 2 places using `systemName`:
      * `RpcEnvConfig.name`. Actually, although it's used as `systemName` in `AkkaRpcEnv`, `NettyRpcEnv` uses it as the service name to output the log `Successfully started service *** on port ***`. Since the service name in log is useful, I keep `RpcEnvConfig.name`.
      * `def setupEndpointRef(systemName: String, address: RpcAddress, endpointName: String)`. Each `ActorSystem` has a `systemName`. Akka requires `systemName` in its URI and will refuse a connection if `systemName` is not matched. However, `NettyRpcEnv` doesn't use it. So we can remove `systemName` from `setupEndpointRef` since we are removing `AkkaRpcEnv`.
      
      ### Remove RpcEnv.uriOf
      
      `uriOf` exists because Akka uses different URI formats for with and without authentication, e.g., `akka.ssl.tcp...` and `akka.tcp://...`. But `NettyRpcEnv` uses the same format. So it's not necessary after removing `AkkaRpcEnv`.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #10459 from zsxwing/remove-akka-rpc-env.
      4f5a24d7
    • Davies Liu's avatar
      [SPARK-12585] [SQL] move numFields to constructor of UnsafeRow · e6c77874
      Davies Liu authored
      Right now, numFields will be passed in by pointTo(), then bitSetWidthInBytes is calculated, making pointTo() a little bit heavy.
      
      It should be part of constructor of UnsafeRow.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #10528 from davies/numFields.
      e6c77874
  2. Dec 30, 2015
  3. Dec 29, 2015
    • Shixiong Zhu's avatar
      [SPARK-12490][CORE] Limit the css style scope to fix the Streaming UI · 7ab0e228
      Shixiong Zhu authored
      #10441 broke the Streaming UI because of the new CSS style.
      
      <img width="503" alt="screen shot 2015-12-29 at 4 49 04 pm" src="https://cloud.githubusercontent.com/assets/1000778/12044763/1efce0fe-ae4c-11e5-9f8b-39df08426bf8.png">
      
      This PR just added a class for the new style and only applied them to the paged tables.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #10517 from zsxwing/fix-streaming-ui.
      7ab0e228
    • Nong Li's avatar
      [SPARK-12362][SQL][WIP] Inline Hive Parser · b600bccf
      Nong Li authored
      This is a WIP. The PR has been taken over from nongli (see https://github.com/apache/spark/pull/10420). I have removed some additional dead code, and fixed a few issues which were caused by the fact that the inlined Hive parser is newer than the Hive parser we currently use in Spark.
      
      I am submitting this PR in order to get some feedback and testing done. There is quite a bit of work to do:
      - [ ] Get it to pass jenkins build/test.
      - [ ] Aknowledge Hive-project for using their parser.
      - [ ] Refactorings between HiveQl and the java classes.
        - [ ] Create our own ASTNode and integrate the current implicit extentions.
        - [ ] Move remaining ```SemanticAnalyzer``` and ```ParseUtils``` functionality to ```HiveQl```.
      - [ ] Removing Hive dependencies from the parser. This will require some edits in the grammar files.
        - [ ] Introduce our own context which needs to contain a ```TokenRewriteStream```.
        - [ ] Add ```useSQL11ReservedKeywordsForIdentifier``` and ```allowQuotedId``` to the catalyst or sql configuration.
        - [ ] Remove ```HiveConf``` from grammar files &HiveQl, and pass in our own configuration.
      - [ ] Moving the parser into sql/core.
      
      cc nongli rxin
      
      Author: Herman van Hovell <hvanhovell@questtec.nl>
      Author: Nong Li <nong@databricks.com>
      Author: Nong Li <nongli@gmail.com>
      
      Closes #10509 from hvanhovell/SPARK-12362.
      b600bccf
    • Reynold Xin's avatar
      [SPARK-12549][SQL] Take Option[Seq[DataType]] in UDF input type specification. · 270a6595
      Reynold Xin authored
      In Spark we allow UDFs to declare its expected input types in order to apply type coercion. The expected input type parameter takes a Seq[DataType] and uses Nil when no type coercion is applied. It makes more sense to take Option[Seq[DataType]] instead, so we can differentiate a no-arg function vs function with no expected input type specified.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #10504 from rxin/SPARK-12549.
      270a6595
    • Sean Owen's avatar
      [SPARK-12349][SPARK-12349][ML] Fix typo in Spark version regex introduced in / PR 10327 · be86268e
      Sean Owen authored
      Sorry jkbradley
      Ref: https://github.com/apache/spark/pull/10327#discussion_r48502942
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #10508 from srowen/SPARK-12349.2.
      be86268e
    • Hossein's avatar
      [SPARK-11199][SPARKR] Improve R context management story and add getOrCreate · f6ecf143
      Hossein authored
      * Changes api.r.SQLUtils to use ```SQLContext.getOrCreate``` instead of creating a new context.
      * Adds a simple test
      
      [SPARK-11199] #comment link with JIRA
      
      Author: Hossein <hossein@databricks.com>
      
      Closes #9185 from falaki/SPARK-11199.
      f6ecf143
    • Kazuaki Ishizaki's avatar
      [SPARK-12530][BUILD] Fix build break at Spark-Master-Maven-Snapshots from #1293 · 8e629b10
      Kazuaki Ishizaki authored
      Compilation error caused due to string concatenations that are not a constant
      Use raw string literal to avoid string concatenations
      
      https://amplab.cs.berkeley.edu/jenkins/view/Spark-Packaging/job/Spark-Master-Maven-Snapshots/1293/
      
      Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com>
      
      Closes #10488 from kiszk/SPARK-12530.
      8e629b10
    • Forest Fang's avatar
      [SPARK-12526][SPARKR] ifelse`, `when`, `otherwise` unable to take Column as value · d80cc90b
      Forest Fang authored
      `ifelse`, `when`, `otherwise` is unable to take `Column` typed S4 object as values.
      
      For example:
      ```r
      ifelse(lit(1) == lit(1), lit(2), lit(3))
      ifelse(df$mpg > 0, df$mpg, 0)
      ```
      will both fail with
      ```r
      attempt to replicate an object of type 'environment'
      ```
      
      The PR replaces `ifelse` calls with `if ... else ...` inside the function implementations to avoid attempt to vectorize(i.e. `rep()`). It remains to be discussed whether we should instead support vectorization in these functions for consistency because `ifelse` in base R is vectorized but I cannot foresee any scenarios these functions will want to be vectorized in SparkR.
      
      For reference, added test cases which trigger failures:
      ```r
      . Error: when(), otherwise() and ifelse() with column on a DataFrame ----------
      error in evaluating the argument 'x' in selecting a method for function 'collect':
        error in evaluating the argument 'col' in selecting a method for function 'select':
        attempt to replicate an object of type 'environment'
      Calls: when -> when -> ifelse -> ifelse
      
      1: withCallingHandlers(eval(code, new_test_environment), error = capture_calls, message = function(c) invokeRestart("muffleMessage"))
      2: eval(code, new_test_environment)
      3: eval(expr, envir, enclos)
      4: expect_equal(collect(select(df, when(df$a > 1 & df$b > 2, lit(1))))[, 1], c(NA, 1)) at test_sparkSQL.R:1126
      5: expect_that(object, equals(expected, label = expected.label, ...), info = info, label = label)
      6: condition(object)
      7: compare(actual, expected, ...)
      8: collect(select(df, when(df$a > 1 & df$b > 2, lit(1))))
      Error: Test failures
      Execution halted
      ```
      
      Author: Forest Fang <forest.fang@outlook.com>
      
      Closes #10481 from saurfang/spark-12526.
      d80cc90b
  4. Dec 28, 2015
    • Takeshi YAMAMURO's avatar
      [SPARK-11394][SQL] Throw IllegalArgumentException for unsupported types in postgresql · 73862a1e
      Takeshi YAMAMURO authored
      If DataFrame has BYTE types, throws an exception:
      org.postgresql.util.PSQLException: ERROR: type "byte" does not exist
      
      Author: Takeshi YAMAMURO <linguin.m.s@gmail.com>
      
      Closes #9350 from maropu/FixBugInPostgreJdbc.
      73862a1e
    • Reynold Xin's avatar
      [SPARK-12547][SQL] Tighten scala style checker enforcement for UDF registration · 1a91be80
      Reynold Xin authored
      We use scalastyle:off to turn off style checks in certain places where it is not possible to follow the style guide. This is usually ok. However, in udf registration, we disable the checker for a large amount of code simply because some of them exceed 100 char line limit. It is better to just disable the line limit check rather than everything.
      
      In this pull request, I only disabled line length check, and fixed a problem (lack explicit types for public methods).
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #10501 from rxin/SPARK-12547.
      1a91be80
    • gatorsmile's avatar
      [SPARK-12522][SQL][MINOR] Add the missing document strings for the SQL configuration · 04313581
      gatorsmile authored
      Fixing the missing the document for the configuration. We can see the missing messages "TODO" when issuing the command "SET -V".
      ```
      spark.sql.columnNameOfCorruptRecord
      spark.sql.hive.verifyPartitionPath
      spark.sql.sources.parallelPartitionDiscovery.threshold
      spark.sql.hive.convertMetastoreParquet.mergeSchema
      spark.sql.hive.convertCTAS
      spark.sql.hive.thriftServer.async
      ```
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #10471 from gatorsmile/commandDesc.
      04313581
    • Josh Rosen's avatar
      [SPARK-12490] Don't use Javascript for web UI's paginated table controls · 124a3a5e
      Josh Rosen authored
      The web UI's paginated table uses Javascript to implement certain navigation controls, such as table sorting and the "go to page" form. This is unnecessary and should be simplified to use plain HTML form controls and links.
      
      /cc zsxwing, who wrote this original code, and yhuai.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10441 from JoshRosen/simplify-paginated-table-sorting.
      124a3a5e
    • Shixiong Zhu's avatar
      [SPARK-12489][CORE][SQL][MLIB] Fix minor issues found by FindBugs · 710b4117
      Shixiong Zhu authored
      Include the following changes:
      
      1. Close `java.sql.Statement`
      2. Fix incorrect `asInstanceOf`.
      3. Remove unnecessary `synchronized` and `ReentrantLock`.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #10440 from zsxwing/findbugs.
      710b4117
    • Josh Rosen's avatar
      [SPARK-12525] Fix fatal compiler warnings in Kinesis ASL due to @transient annotations · fb572c6e
      Josh Rosen authored
      The Scala 2.11 SBT build currently fails for Spark 1.6.0 and master due to warnings about the `transient` annotation:
      
      ```
      [error] [warn] /Users/joshrosen/Documents/spark/extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala:73: no valid targets for annotation on value sc - it is discarded unused. You may specify targets with meta-annotations, e.g. (transient param)
      [error] [warn]     transient sc: SparkContext,
      ```
      
      This fix implemented here is the same as what we did in #8433: remove the `transient` annotations when they are not necessary and replace use  `transient private val` in the remaining cases.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10479 from JoshRosen/fix-sbt-2.11.
      fb572c6e
    • Daoyuan Wang's avatar
      [SPARK-12222][CORE] Deserialize RoaringBitmap using Kryo serializer throw... · a6d38532
      Daoyuan Wang authored
      [SPARK-12222][CORE] Deserialize RoaringBitmap using Kryo serializer throw Buffer underflow exception
      
      Since we only need to implement `def skipBytes(n: Int)`,
      code in #10213 could be simplified.
      davies scwf
      
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      
      Closes #10253 from adrian-wang/kryo.
      a6d38532
    • gatorsmile's avatar
      [SPARK-12441][SQL] Fixing missingInput in Generate/MapPartitions/AppendColumns/MapGroups/CoGroup · 01ba95d8
      gatorsmile authored
      When explain any plan with Generate, we will see an exclamation mark in the plan. Normally, when we see this mark, it means the plan has an error. This PR is to correct the `missingInput` in `Generate`.
      
      For example,
      ```scala
      val df = Seq((1, "a b c"), (2, "a b"), (3, "a")).toDF("number", "letters")
      val df2 =
        df.explode('letters) {
          case Row(letters: String) => letters.split(" ").map(Tuple1(_)).toSeq
        }
      
      df2.explain(true)
      ```
      Before the fix, the plan is like
      ```
      == Parsed Logical Plan ==
      'Generate UserDefinedGenerator('letters), true, false, None
      +- Project [_1#0 AS number#2,_2#1 AS letters#3]
         +- LocalRelation [_1#0,_2#1], [[1,a b c],[2,a b],[3,a]]
      
      == Analyzed Logical Plan ==
      number: int, letters: string, _1: string
      Generate UserDefinedGenerator(letters#3), true, false, None, [_1#8]
      +- Project [_1#0 AS number#2,_2#1 AS letters#3]
         +- LocalRelation [_1#0,_2#1], [[1,a b c],[2,a b],[3,a]]
      
      == Optimized Logical Plan ==
      Generate UserDefinedGenerator(letters#3), true, false, None, [_1#8]
      +- LocalRelation [number#2,letters#3], [[1,a b c],[2,a b],[3,a]]
      
      == Physical Plan ==
      !Generate UserDefinedGenerator(letters#3), true, false, [number#2,letters#3,_1#8]
      +- LocalTableScan [number#2,letters#3], [[1,a b c],[2,a b],[3,a]]
      ```
      
      **Updates**: The same issues are also found in the other four Dataset operators: `MapPartitions`/`AppendColumns`/`MapGroups`/`CoGroup`. Fixed all these four.
      
      Author: gatorsmile <gatorsmile@gmail.com>
      Author: xiaoli <lixiao1983@gmail.com>
      Author: Xiao Li <xiaoli@Xiaos-MacBook-Pro.local>
      
      Closes #10393 from gatorsmile/generateExplain.
      01ba95d8
    • Stephan Kessler's avatar
      [SPARK-7727][SQL] Avoid inner classes in RuleExecutor · a6a48124
      Stephan Kessler authored
      Moved (case) classes Strategy, Once, FixedPoint and Batch to the companion object. This is necessary if we want to have the Optimizer easily extendable in the following sense: Usually a user wants to add additional rules, and just take the ones that are already there. However, inner classes made that impossible since the code did not compile
      
      This allows easy extension of existing Optimizers see the DefaultOptimizerExtendableSuite for a corresponding test case.
      
      Author: Stephan Kessler <stephan.kessler@sap.com>
      
      Closes #10174 from stephankessler/SPARK-7727.
      a6a48124
    • Kousuke Saruta's avatar
      [SPARK-12424][ML] The implementation of ParamMap#filter is wrong. · 07165ca0
      Kousuke Saruta authored
      ParamMap#filter uses `mutable.Map#filterKeys`. The return type of `filterKey` is collection.Map, not mutable.Map but the result is casted to mutable.Map using `asInstanceOf` so we get `ClassCastException`.
      Also, the return type of Map#filterKeys is not Serializable. It's the issue of Scala (https://issues.scala-lang.org/browse/SI-6654).
      
      Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
      
      Closes #10381 from sarutak/SPARK-12424.
      07165ca0
    • gatorsmile's avatar
      [SPARK-12287][SQL] Support UnsafeRow in MapPartitions/MapGroups/CoGroup · e01c6c86
      gatorsmile authored
      Support Unsafe Row in MapPartitions/MapGroups/CoGroup.
      
      Added a test case for MapPartitions. Since MapGroups and CoGroup are built on AppendColumns, all the related dataset test cases already can verify the correctness when MapGroups and CoGroup processing unsafe rows.
      
      davies cloud-fan Not sure if my understanding is right, please correct me. Thank you!
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #10398 from gatorsmile/unsafeRowMapGroup.
      e01c6c86
    • Yaron Weinsberg's avatar
      [SPARK-12517] add default RDD name for one created via sc.textFile · 73b70f07
      Yaron Weinsberg authored
      The feature was first added at commit: 7b877b27 but was later removed (probably by mistake) at commit: fc8b5819.
      This change sets the default path of RDDs created via sc.textFile(...) to the path argument.
      
      Here is the symptom:
      
      * Using spark-1.5.2-bin-hadoop2.6:
      
      scala> sc.textFile("/home/root/.bashrc").name
      res5: String = null
      
      scala> sc.binaryFiles("/home/root/.bashrc").name
      res6: String = /home/root/.bashrc
      
      * while using Spark 1.3.1:
      
      scala> sc.textFile("/home/root/.bashrc").name
      res0: String = /home/root/.bashrc
      
      scala> sc.binaryFiles("/home/root/.bashrc").name
      res1: String = /home/root/.bashrc
      
      Author: Yaron Weinsberg <wyaron@gmail.com>
      Author: yaron <yaron@il.ibm.com>
      
      Closes #10456 from wyaron/master.
      73b70f07
    • Kevin Yu's avatar
      [SPARK-12231][SQL] create a combineFilters' projection when we call buildPartitionedTableScan · fd50df41
      Kevin Yu authored
      Hello Michael & All:
      
      We have some issues to submit the new codes in the other PR(#10299), so we closed that PR and open this one with the fix.
      
      The reason for the previous failure is that the projection for the scan when there is a filter that is not pushed down (the "left-over" filter) could be different, in elements or ordering, from the original projection.
      
      With this new codes, the approach to solve this problem is:
      
      Insert a new Project if the "left-over" filter is nonempty and (the original projection is not empty and the projection for the scan has more than one elements which could otherwise cause different ordering in projection).
      
      We create 3 test cases to cover the otherwise failure cases.
      
      Author: Kevin Yu <qyu@us.ibm.com>
      
      Closes #10388 from kevinyu98/spark-12231.
      fd50df41
Loading