Skip to content
Snippets Groups Projects
  1. Oct 08, 2015
    • Michael Armbrust's avatar
      Revert [SPARK-8654] [SQL] Fix Analysis exception when using NULL IN · a8226a9f
      Michael Armbrust authored
      This reverts commit dcbd58a9 from #8983
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #9034 from marmbrus/revert8654.
      a8226a9f
    • Wenchen Fan's avatar
      [SPARK-10337] [SQL] fix hive views on non-hive-compatible tables. · af2a5544
      Wenchen Fan authored
      add a new config to deal with this special case.
      
      Author: Wenchen Fan <cloud0fan@163.com>
      
      Closes #8990 from cloud-fan/view-master.
      af2a5544
    • Yin Huai's avatar
      [SPARK-10887] [SQL] Build HashedRelation outside of HashJoinNode. · 82d275f2
      Yin Huai authored
      This PR refactors `HashJoinNode` to take a existing `HashedRelation`. So, we can reuse this node for both `ShuffledHashJoin` and `BroadcastHashJoin`.
      
      https://issues.apache.org/jira/browse/SPARK-10887
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #8953 from yhuai/SPARK-10887.
      82d275f2
    • tedyu's avatar
      [SPARK-11006] Rename NullColumnAccess as NullColumnAccessor · 2a6f614c
      tedyu authored
      davies
      I think NullColumnAccessor follows same convention for other accessors
      
      Author: tedyu <yuzhihong@gmail.com>
      
      Closes #9028 from tedyu/master.
      2a6f614c
    • Yanbo Liang's avatar
      [SPARK-7770] [ML] GBT validationTol change to compare with relative or absolute error · 22683560
      Yanbo Liang authored
      GBT compare ValidateError with tolerance switching between relative and absolute ones, where the former one is relative to the current loss on the training set.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #8549 from yanboliang/spark-7770.
      22683560
    • Holden Karau's avatar
      [SPARK-9718] [ML] linear regression training summary all columns · 0903c648
      Holden Karau authored
      LinearRegression training summary: The transformed dataset should hold all columns, not just selected ones like prediction and label. There is no real need to remove some, and the user may find them useful.
      
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #8564 from holdenk/SPARK-9718-LinearRegressionTrainingSummary-all-columns.
      0903c648
    • Dilip Biswal's avatar
      [SPARK-8654] [SQL] Fix Analysis exception when using NULL IN (...) · dcbd58a9
      Dilip Biswal authored
      In the analysis phase , while processing the rules for IN predicate, we
      compare the in-list types to the lhs expression type and generate
      cast operation if necessary. In the case of NULL [NOT] IN expr1 , we end up
      generating cast between in list types to NULL like cast (1 as NULL) which
      is not a valid cast.
      
      The fix is to not generate such a cast if the lhs type is a NullType instead
      we translate the expression to Literal(Null).
      
      Author: Dilip Biswal <dbiswal@us.ibm.com>
      
      Closes #8983 from dilipbiswal/spark_8654.
      dcbd58a9
    • Michael Armbrust's avatar
      [SPARK-10998] [SQL] Show non-children in default Expression.toString · 5c9fdf74
      Michael Armbrust authored
      Its pretty hard to debug problems with expressions when you can't see all the arguments.
      
      Before: `invoke()`
      After: `invoke(inputObject#1, intField, IntegerType)`
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #9022 from marmbrus/expressionToString.
      5c9fdf74
    • Narine Kokhlikyan's avatar
      [SPARK-10836] [SPARKR] Added sort(x, decreasing, col, ... ) method to DataFrame · e8f90d9d
      Narine Kokhlikyan authored
      the sort function can be used as an alternative to arrange(... ).
      As arguments it accepts x - dataframe, decreasing - TRUE/FALSE, a list of orderings for columns and the list of columns, represented as string names
      
      for example:
      sort(df, TRUE, "col1","col2","col3","col5") # for example, if we want to sort some of the columns in the same order
      
      sort(df, decreasing=TRUE, "col1")
      sort(df, decreasing=c(TRUE,FALSE), "col1","col2")
      
      Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com>
      
      Closes #8920 from NarineK/sparkrsort.
      e8f90d9d
    • Marcelo Vanzin's avatar
      [SPARK-10987] [YARN] Workaround for missing netty rpc disconnection event. · 56a9692f
      Marcelo Vanzin authored
      In YARN client mode, when the AM connects to the driver, it may be the case
      that the driver never needs to send a message back to the AM (i.e., no
      dynamic allocation or preemption). This triggers an issue in the netty rpc
      backend where no disconnection event is sent to endpoints, and the AM never
      exits after the driver shuts down.
      
      The real fix is too complicated, so this is a quick hack to unblock YARN
      client mode until we can work on the real fix. It forces the driver to
      send a message to the AM when the AM registers, thus establishing that
      connection and enabling the disconnection event when the driver goes
      away.
      
      Also, a minor side issue: when the executor is shutting down, it needs
      to send an "ack" back to the driver when using the netty rpc backend; but
      that "ack" wasn't being sent because the handler was shutting down the rpc
      env before returning. So added a change to delay the shutdown a little bit,
      allowing the ack to be sent back.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #9021 from vanzin/SPARK-10987.
      56a9692f
    • Cheng Lian's avatar
      [SPARK-5775] [SPARK-5508] [SQL] Re-enable Hive Parquet array reading tests · 2df882ef
      Cheng Lian authored
      Since SPARK-5508 has already been fixed.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #8999 from liancheng/spark-5775.enable-array-tests.
      2df882ef
    • Cheng Lian's avatar
      [SPARK-10999] [SQL] Coalesce should be able to handle UnsafeRow · 59b0606f
      Cheng Lian authored
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #9024 from liancheng/spark-10999.coalesce-unsafe-row-handling.
      59b0606f
    • Jean-Baptiste Onofré's avatar
      [SPARK-10883] Add a note about how to build Spark sub-modules (reactor) · 60150cf0
      Jean-Baptiste Onofré authored
      Author: Jean-Baptiste Onofré <jbonofre@apache.org>
      
      Closes #8993 from jbonofre/SPARK-10883-2.
      60150cf0
    • admackin's avatar
      Akka framesize units should be specified · cd28139c
      admackin authored
      1.4 docs noted that the units were MB - i have assumed this is still the case
      
      Author: admackin <admackin@users.noreply.github.com>
      
      Closes #9025 from admackin/master.
      cd28139c
    • 0x0FFF's avatar
      [SPARK-7869][SQL] Adding Postgres JSON and JSONb data types support · b8f849b5
      0x0FFF authored
      This PR addresses [SPARK-7869](https://issues.apache.org/jira/browse/SPARK-7869)
      
      Before the patch, attempt to load the table from Postgres with JSON/JSONb datatype caused error `java.sql.SQLException: Unsupported type 1111`
      Postgres data types JSON and JSONb are now mapped to String on Spark side thus they can be loaded into DF and processed on Spark side
      
      Example
      
      Postgres:
      ```
      create table test_json  (id int, value json);
      create table test_jsonb (id int, value jsonb);
      
      insert into test_json (id, value) values
      (1, '{"field1":"value1","field2":"value2","field3":[1,2,3]}'::json),
      (2, '{"field1":"value3","field2":"value4","field3":[4,5,6]}'::json),
      (3, '{"field3":"value5","field4":"value6","field3":[7,8,9]}'::json);
      
      insert into test_jsonb (id, value) values
      (4, '{"field1":"value1","field2":"value2","field3":[1,2,3]}'::jsonb),
      (5, '{"field1":"value3","field2":"value4","field3":[4,5,6]}'::jsonb),
      (6, '{"field3":"value5","field4":"value6","field3":[7,8,9]}'::jsonb);
      ```
      
      PySpark:
      ```
      >>> import json
      >>> df1 = sqlContext.read.jdbc("jdbc:postgresql://127.0.0.1:5432/test?user=testuser", "test_json")
      >>> df1.map(lambda x: (x.id, json.loads(x.value))).map(lambda (id, value): (id, value.get('field3'))).collect()
      [(1, [1, 2, 3]), (2, [4, 5, 6]), (3, [7, 8, 9])]
      >>> df2 = sqlContext.read.jdbc("jdbc:postgresql://127.0.0.1:5432/test?user=testuser", "test_jsonb")
      >>> df2.map(lambda x: (x.id, json.loads(x.value))).map(lambda (id, value): (id, value.get('field1'))).collect()
      [(4, u'value1'), (5, u'value3'), (6, None)]
      ```
      
      Author: 0x0FFF <programmerag@gmail.com>
      
      Closes #8948 from 0x0FFF/SPARK-7869.
      b8f849b5
  2. Oct 07, 2015
    • Holden Karau's avatar
      [SPARK-9774] [ML] [PYSPARK] Add python api for ml regression isotonicregression · 3aff0866
      Holden Karau authored
      Add the Python API for isotonicregression.
      
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #8214 from holdenk/SPARK-9774-add-python-api-for-ml-regression-isotonicregression.
      3aff0866
    • Nathan Howell's avatar
      [SPARK-10064] [ML] Parallelize decision tree bin split calculations · 1bc435ae
      Nathan Howell authored
      Reimplement `DecisionTree.findSplitsBins` via `RDD` to parallelize bin calculation.
      
      With large feature spaces the current implementation is very slow. This change limits the features that are distributed (or collected) to just the continuous features, and performs the split calculations in parallel. It completes on a real multi terabyte dataset in less than a minute instead of multiple hours.
      
      Author: Nathan Howell <nhowell@godaddy.com>
      
      Closes #8246 from NathanHowell/SPARK-10064.
      1bc435ae
    • Davies Liu's avatar
      [SPARK-10917] [SQL] improve performance of complex type in columnar cache · 075a0b65
      Davies Liu authored
      This PR improve the performance of complex types in columnar cache by using UnsafeProjection instead of KryoSerializer.
      
      A simple benchmark show that this PR could improve the performance of scanning a cached table with complex columns by 15x (comparing to Spark 1.5).
      
      Here is the code used to benchmark:
      
      ```
      df = sc.range(1<<23).map(lambda i: Row(a=Row(b=i, c=str(i)), d=range(10), e=dict(zip(range(10), [str(i) for i in range(10)])))).toDF()
      df.write.parquet("table")
      ```
      ```
      df = sqlContext.read.parquet("table")
      df.cache()
      df.count()
      t = time.time()
      print df.select("*")._jdf.queryExecution().toRdd().count()
      print time.time() - t
      ```
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #8971 from davies/complex.
      075a0b65
    • DB Tsai's avatar
      [SPARK-10738] [ML] Refactoring `Instance` out from LOR and LIR, and also cleaning up some code · dd36ec6b
      DB Tsai authored
      Refactoring `Instance` case class out from LOR and LIR, and also cleaning up some code.
      
      Author: DB Tsai <dbt@netflix.com>
      
      Closes #8853 from dbtsai/refactoring.
      dd36ec6b
    • Josh Rosen's avatar
      [SPARK-9702] [SQL] Use Exchange to implement logical Repartition operator · 7e2e2682
      Josh Rosen authored
      This patch allows `Repartition` to support UnsafeRows. This is accomplished by implementing the logical `Repartition` operator in terms of `Exchange` and a new `RoundRobinPartitioning`.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      Author: Liang-Chi Hsieh <viirya@appier.com>
      
      Closes #8083 from JoshRosen/SPARK-9702.
      7e2e2682
    • Davies Liu's avatar
      [SPARK-10980] [SQL] fix bug in create Decimal · 37526aca
      Davies Liu authored
      The created decimal is wrong if using `Decimal(unscaled, precision, scale)` with unscaled > 1e18 and and precision > 18 and scale > 0.
      
      This bug exists since the beginning.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #9014 from davies/fix_decimal.
      37526aca
    • Yanbo Liang's avatar
      [SPARK-10490] [ML] Consolidate the Cholesky solvers in WeightedLeastSquares and ALS · 7bf07faa
      Yanbo Liang authored
      Consolidate the Cholesky solvers in WeightedLeastSquares and ALS.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #8936 from yanboliang/spark-10490.
      7bf07faa
    • Reynold Xin's avatar
      [SPARK-10982] [SQL] Rename ExpressionAggregate -> DeclarativeAggregate. · 6dbfd7ec
      Reynold Xin authored
      DeclarativeAggregate matches more closely with ImperativeAggregate we already have.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #9013 from rxin/SPARK-10982.
      6dbfd7ec
    • Evan Chen's avatar
      [SPARK-10779] [PYSPARK] [MLLIB] Set initialModel for KMeans model in PySpark (spark.mllib) · da936fbb
      Evan Chen authored
      Provide initialModel param for pyspark.mllib.clustering.KMeans
      
      Author: Evan Chen <chene@us.ibm.com>
      
      Closes #8967 from evanyc15/SPARK-10779-pyspark-mllib.
      da936fbb
    • navis.ryu's avatar
      [SPARK-10679] [CORE] javax.jdo.JDOFatalUserException in executor · 713e4f44
      navis.ryu authored
      HadoopRDD throws exception in executor, something like below.
      {noformat}
      5/09/17 18:51:21 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
      15/09/17 18:51:21 INFO metastore.ObjectStore: ObjectStore, initialize called
      15/09/17 18:51:21 WARN metastore.HiveMetaStore: Retrying creating default database after error: Class org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found.
      javax.jdo.JDOFatalUserException: Class org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found.
      	at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1175)
      	at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
      	at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
      	at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:365)
      	at org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:394)
      	at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:291)
      	at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:258)
      	at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
      	at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
      	at org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:57)
      	at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:66)
      	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:593)
      	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:571)
      	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620)
      	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461)
      	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:66)
      	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
      	at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
      	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:199)
      	at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:74)
      	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
      	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
      	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
      	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
      	at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)
      	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86)
      	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
      	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
      	at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005)
      	at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024)
      	at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1234)
      	at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174)
      	at org.apache.hadoop.hive.ql.metadata.Hive.<clinit>(Hive.java:166)
      	at org.apache.hadoop.hive.ql.plan.PlanUtils.configureJobPropertiesForStorageHandler(PlanUtils.java:803)
      	at org.apache.hadoop.hive.ql.plan.PlanUtils.configureInputJobPropertiesForStorageHandler(PlanUtils.java:782)
      	at org.apache.spark.sql.hive.HadoopTableReader$.initializeLocalJobConfFunc(TableReader.scala:298)
      	at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$12.apply(TableReader.scala:274)
      	at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$12.apply(TableReader.scala:274)
      	at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
      	at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176)
      	at scala.Option.map(Option.scala:145)
      	at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176)
      	at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:220)
      	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216)
      	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
      	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
      	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
      	at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
      	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
      	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
      	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
      	at org.apache.spark.scheduler.Task.run(Task.scala:88)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      	at java.lang.Thread.run(Thread.java:745)
      {noformat}
      
      Author: navis.ryu <navis@apache.org>
      
      Closes #8804 from navis/SPARK-10679.
      713e4f44
    • Liang-Chi Hsieh's avatar
      [SPARK-10856][SQL] Mapping TimestampType to DATETIME for SQL Server jdbc dialect · c14aee4d
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-10856
      
      For Microsoft SQL Server, TimestampType should be mapped to DATETIME instead of TIMESTAMP.
      
      Related information for the datatype mapping: https://msdn.microsoft.com/en-us/library/ms378878(v=sql.110).aspx
      
      Author: Liang-Chi Hsieh <viirya@appier.com>
      
      Closes #8978 from viirya/mysql-jdbc-timestamp.
      c14aee4d
    • Marcelo Vanzin's avatar
      [SPARK-10300] [BUILD] [TESTS] Add support for test tags in run-tests.py. · 94fc57af
      Marcelo Vanzin authored
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #8775 from vanzin/SPARK-10300.
      94fc57af
    • Josh Rosen's avatar
      [SPARK-10941] [SQL] Refactor AggregateFunction2 and AlgebraicAggregate... · a9ecd061
      Josh Rosen authored
      [SPARK-10941] [SQL] Refactor AggregateFunction2 and AlgebraicAggregate interfaces to improve code clarity
      
      This patch refactors several of the Aggregate2 interfaces in order to improve code clarity.
      
      The biggest change is a refactoring of the `AggregateFunction2` class hierarchy. In the old code, we had a class named `AlgebraicAggregate` that inherited from `AggregateFunction2`, added a new set of methods, then banned the use of the inherited methods. I found this to be fairly confusing because.
      
      If you look carefully at the existing code, you'll see that subclasses of `AggregateFunction2` fall into two disjoint categories: imperative aggregation functions which directly extended `AggregateFunction2` and declarative, expression-based aggregate functions which extended `AlgebraicAggregate`. In order to make this more explicit, this patch refactors things so that `AggregateFunction2` is a sealed abstract class with two subclasses, `ImperativeAggregateFunction` and `ExpressionAggregateFunction`. The superclass, `AggregateFunction2`, now only contains methods and fields that are common to both subclasses.
      
      After making this change, I updated the various AggregationIterator classes to comply with this new naming scheme. I also performed several small renamings in the aggregate interfaces themselves in order to improve clarity and rewrote or expanded a number of comments.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #8973 from JoshRosen/tungsten-agg-comments.
      a9ecd061
    • Holden Karau's avatar
      [SPARK-9841] [ML] Make clear public · 5be5d247
      Holden Karau authored
      It is currently impossible to clear Param values once set. It would be helpful to be able to.
      
      Author: Holden Karau <holden@pigscanfly.ca>
      
      Closes #8619 from holdenk/SPARK-9841-params-clear-needs-to-be-public.
      5be5d247
    • Marcelo Vanzin's avatar
      [SPARK-10964] [YARN] Correctly register the AM with the driver. · 6ca27f85
      Marcelo Vanzin authored
      The `self` method returns null when called from the constructor;
      instead, registration should happen in the `onStart` method, at
      which point the `self` reference has already been initialized.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #9005 from vanzin/SPARK-10964.
      6ca27f85
    • Marcelo Vanzin's avatar
      [SPARK-10812] [YARN] Fix shutdown of token renewer. · 4b747551
      Marcelo Vanzin authored
      A recent change to fix the referenced bug caused this exception in
      the `SparkContext.stop()` path:
      
      org.apache.spark.SparkException: YarnSparkHadoopUtil is not available in non-YARN mode!
              at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil$.get(YarnSparkHadoopUtil.scala:167)
              at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.stop(YarnClientSchedulerBackend.scala:182)
              at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:440)
              at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1579)
              at org.apache.spark.SparkContext$$anonfun$stop$7.apply$mcV$sp(SparkContext.scala:1730)
              at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1185)
              at org.apache.spark.SparkContext.stop(SparkContext.scala:1729)
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #8996 from vanzin/SPARK-10812.
      4b747551
    • Michael Armbrust's avatar
      [SPARK-10966] [SQL] Codegen framework cleanup · f5d154bc
      Michael Armbrust authored
      This PR is mostly cosmetic and cleans up some warts in codegen (nearly all of which were inherited from the original quasiquote version).
       - Add lines numbers to errors (in stacktraces when debug logging is on, and always for compile fails)
       - Use a variable for input row instead of hardcoding "i" everywhere
       - rename `primitive` -> `value` (since its often actually an object)
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #9006 from marmbrus/codegen-cleanup.
      f5d154bc
    • Kevin Cox's avatar
      [SPARK-10952] Only add hive to classpath if HIVE_HOME is set. · 9672602c
      Kevin Cox authored
      Currently if it isn't set it scans `/lib/*` and adds every dir to the
      classpath which makes the env too large and every command called
      afterwords fails.
      
      Author: Kevin Cox <kevincox@kevincox.ca>
      
      Closes #8994 from kevincox/kevincox-only-add-hive-to-classpath-if-var-is-set.
      9672602c
    • Sun Rui's avatar
      [SPARK-10752] [SPARKR] Implement corr() and cov in DataFrameStatFunctions. · f57c63d4
      Sun Rui authored
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #8869 from sun-rui/SPARK-10752.
      f57c63d4
    • Xin Ren's avatar
      [SPARK-10669] [DOCS] Link to each language's API in codetabs in ML docs: spark.mllib · 27cdde2f
      Xin Ren authored
      In the Markdown docs for the spark.mllib Programming Guide, we have code examples with codetabs for each language. We should link to each language's API docs within the corresponding codetab, but we are inconsistent about this. For an example of what we want to do, see the "ChiSqSelector" section in https://github.com/apache/spark/blob/64743870f23bffb8d96dcc8a0181c1452782a151/docs/mllib-feature-extraction.md
      This JIRA is just for spark.mllib, not spark.ml.
      
      Please let me know if more work is needed, thanks a lot.
      
      Author: Xin Ren <iamshrek@126.com>
      
      Closes #8977 from keypointt/SPARK-10669.
      27cdde2f
  3. Oct 06, 2015
Loading