Skip to content
Snippets Groups Projects
  1. Jan 05, 2016
  2. Jan 04, 2016
    • Reynold Xin's avatar
      [SPARK-12600][SQL] follow up: add range check for DecimalType · b634901b
      Reynold Xin authored
      This addresses davies' code review feedback in https://github.com/apache/spark/pull/10559
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #10586 from rxin/remove-deprecated-sql-followup.
      b634901b
    • felixcheung's avatar
      [SPARKR][DOC] minor doc update for version in migration guide · 8896ec9f
      felixcheung authored
      checked that the change is in Spark 1.6.0.
      shivaram
      
      Author: felixcheung <felixcheung_m@hotmail.com>
      
      Closes #10574 from felixcheung/rwritemodedoc.
      8896ec9f
    • Wenchen Fan's avatar
      [SPARK-12480][SQL] add Hash expression that can calculate hash value for a group of expressions · b1a77123
      Wenchen Fan authored
      just write the arguments into unsafe row and use murmur3 to calculate hash code
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #10435 from cloud-fan/hash-expr.
      b1a77123
    • Reynold Xin's avatar
      [SPARK-12600][SQL] Remove deprecated methods in Spark SQL · 77ab49b8
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #10559 from rxin/remove-deprecated-sql.
      77ab49b8
    • Narine Kokhlikyan's avatar
      [SPARK-12509][SQL] Fixed error messages for DataFrame correlation and covariance · fdfac22d
      Narine Kokhlikyan authored
      Currently, when we call corr or cov on dataframe with invalid input we see these error messages for both corr and cov:
         -  "Currently cov supports calculating the covariance between two columns"
         -  "Covariance calculation for columns with dataType "[DataType Name]" not supported."
      
      I've fixed this issue by passing the function name as an argument. We could also do the input checks separately for each function. I avoided doing that because of code duplication.
      
      Thanks!
      
      Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com>
      
      Closes #10458 from NarineK/sparksqlstatsmessages.
      fdfac22d
    • Nong Li's avatar
      [SPARK-12589][SQL] Fix UnsafeRowParquetRecordReader to properly set the row length. · 34de24ab
      Nong Li authored
      The reader was previously not setting the row length meaning it was wrong if there were variable
      length columns. This problem does not manifest usually, since the value in the column is correct and
      projecting the row fixes the issue.
      
      Author: Nong Li <nong@databricks.com>
      
      Closes #10576 from nongli/spark-12589.
      34de24ab
    • Davies Liu's avatar
      [SPARK-12541] [SQL] support cube/rollup as function · d084a2de
      Davies Liu authored
      This PR enable cube/rollup as function, so they can be used as this:
      ```
      select a, b, sum(c) from t group by rollup(a, b)
      ```
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #10522 from davies/rollup.
      d084a2de
    • Yanbo Liang's avatar
      [SPARK-9622][ML] DecisionTreeRegressor: provide variance of prediction · 93ef9b6a
      Yanbo Liang authored
      DecisionTreeRegressor will provide variance of prediction as a Double column.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #8866 from yanboliang/spark-9622.
      93ef9b6a
    • Yanbo Liang's avatar
      [SPARK-11259][ML] Params.validateParams() should be called automatically · ba5f8185
      Yanbo Liang authored
      See JIRA: https://issues.apache.org/jira/browse/SPARK-11259
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #9224 from yanboliang/spark-11259.
      ba5f8185
    • Herman van Hovell's avatar
      [SPARK-12421][SQL] Prevent Internal/External row from exposing state. · 0171b71e
      Herman van Hovell authored
      It is currently possible to change the values of the supposedly immutable ```GenericRow``` and ```GenericInternalRow``` classes. This is caused by the fact that scala's ArrayOps ```toArray``` (returned by calling ```toSeq```) will return the backing array instead of a copy. This PR fixes this problem.
      
      This PR was inspired by https://github.com/apache/spark/pull/10374 by apo1.
      
      cc apo1 sarutak marmbrus cloud-fan nongli (everyone in the previous conversation).
      
      Author: Herman van Hovell <hvanhovell@questtec.nl>
      
      Closes #10553 from hvanhovell/SPARK-12421.
      0171b71e
    • tedyu's avatar
      [DOC] Adjust coverage for partitionBy() · 40d03960
      tedyu authored
      This is the related thread: http://search-hadoop.com/m/q3RTtO3ReeJ1iF02&subj=Re+partitioning+json+data+in+spark
      
      Michael suggested fixing the doc.
      
      Please review.
      
      Author: tedyu <yuzhihong@gmail.com>
      
      Closes #10499 from ted-yu/master.
      40d03960
    • Xiu Guo's avatar
      [SPARK-12512][SQL] support column name with dot in withColumn() · 573ac55d
      Xiu Guo authored
      Author: Xiu Guo <xguo27@gmail.com>
      
      Closes #10500 from xguo27/SPARK-12512.
      573ac55d
    • Shixiong Zhu's avatar
      [SPARK-12608][STREAMING] Remove submitJobThreadPool since submitJob doesn't... · 43706bf8
      Shixiong Zhu authored
      [SPARK-12608][STREAMING] Remove submitJobThreadPool since submitJob doesn't create a separate thread to wait for the job result
      
      Before #9264, submitJob would create a separate thread to wait for the job result. `submitJobThreadPool` was a workaround in `ReceiverTracker` to run these waiting-job-result threads. Now #9264 has been merged to master and resolved this blocking issue, `submitJobThreadPool` can be removed now.
      
      Author: Shixiong Zhu <shixiong@databricks.com>
      
      Closes #10560 from zsxwing/remove-submitJobThreadPool.
      43706bf8
    • Pete Robbins's avatar
      [SPARK-12470] [SQL] Fix size reduction calculation · b504b6a9
      Pete Robbins authored
      also only allocate required buffer size
      
      Author: Pete Robbins <robbinspg@gmail.com>
      
      Closes #10421 from robbinspg/master.
      b504b6a9
    • Josh Rosen's avatar
      [SPARK-12579][SQL] Force user-specified JDBC driver to take precedence · 6c83d938
      Josh Rosen authored
      Spark SQL's JDBC data source allows users to specify an explicit JDBC driver to load (using the `driver` argument), but in the current code it's possible that the user-specified driver will not be used when it comes time to actually create a JDBC connection.
      
      In a nutshell, the problem is that you might have multiple JDBC drivers on the classpath that claim to be able to handle the same subprotocol, so simply registering the user-provided driver class with the our `DriverRegistry` and JDBC's `DriverManager` is not sufficient to ensure that it's actually used when creating the JDBC connection.
      
      This patch addresses this issue by first registering the user-specified driver with the DriverManager, then iterating over the driver manager's loaded drivers in order to obtain the correct driver and use it to create a connection (previously, we just called `DriverManager.getConnection()` directly).
      
      If a user did not specify a JDBC driver to use, then we call `DriverManager.getDriver` to figure out the class of the driver to use, then pass that class's name to executors; this guards against corner-case bugs in situations where the driver and executor JVMs might have different sets of JDBC drivers on their classpaths (previously, there was the (rare) potential for `DriverManager.getConnection()` to use different drivers on the driver and executors if the user had not explicitly specified a JDBC driver class and the classpaths were different).
      
      This patch is inspired by a similar patch that I made to the `spark-redshift` library (https://github.com/databricks/spark-redshift/pull/143), which contains its own modified fork of some of Spark's JDBC data source code (for cross-Spark-version compatibility reasons).
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10519 from JoshRosen/jdbc-driver-precedence.
      6c83d938
    • Nong Li's avatar
      [SPARK-12486] Worker should kill the executors more forcefully if possible. · 8f659393
      Nong Li authored
      This patch updates the ExecutorRunner's terminate path to use the new java 8 API
      to terminate processes more forcefully if possible. If the executor is unhealthy,
      it would previously ignore the destroy() call. Presumably, the new java API was
      added to handle cases like this.
      
      We could update the termination path in the future to use OS specific commands
      for older java versions.
      
      Author: Nong Li <nong@databricks.com>
      
      Closes #10438 from nongli/spark-12486-executors.
      8f659393
    • guoxu1231's avatar
      [SPARK-12513][STREAMING] SocketReceiver hang in Netcat example · 962aac4d
      guoxu1231 authored
      Explicitly close client side socket connection before restart socket receiver.
      
      Author: guoxu1231 <guoxu1231@gmail.com>
      Author: Shawn Guo <guoxu1231@gmail.com>
      
      Closes #10464 from guoxu1231/SPARK-12513.
      962aac4d
Loading