Skip to content
Snippets Groups Projects
  1. Jan 30, 2016
    • Josh Rosen's avatar
      [SPARK-6363][BUILD] Make Scala 2.11 the default Scala version · 289373b2
      Josh Rosen authored
      This patch changes Spark's build to make Scala 2.11 the default Scala version. To be clear, this does not mean that Spark will stop supporting Scala 2.10: users will still be able to compile Spark for Scala 2.10 by following the instructions on the "Building Spark" page; however, it does mean that Scala 2.11 will be the default Scala version used by our CI builds (including pull request builds).
      
      The Scala 2.11 compiler is faster than 2.10, so I think we'll be able to look forward to a slight speedup in our CI builds (it looks like it's about 2X faster for the Maven compile-only builds, for instance).
      
      After this patch is merged, I'll update Jenkins to add new compile-only jobs to ensure that Scala 2.10 compilation doesn't break.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10608 from JoshRosen/SPARK-6363.
      289373b2
    • Wenchen Fan's avatar
      [SPARK-13098] [SQL] remove GenericInternalRowWithSchema · dab246f7
      Wenchen Fan authored
      This class is only used for serialization of Python DataFrame. However, we don't require internal row there, so `GenericRowWithSchema` can also do the job.
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #10992 from cloud-fan/python.
      dab246f7
  2. Jan 29, 2016
    • Davies Liu's avatar
      [SPARK-12914] [SQL] generate aggregation with grouping keys · e6a02c66
      Davies Liu authored
      This PR add support for grouping keys for generated TungstenAggregate.
      
      Spilling and performance improvements for BytesToBytesMap will be done by followup PR.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #10855 from davies/gen_keys.
      e6a02c66
    • Andrew Or's avatar
      [SPARK-13071] Coalescing HadoopRDD overwrites existing input metrics · 12252d1d
      Andrew Or authored
      This issue is causing tests to fail consistently in master with Hadoop 2.6 / 2.7. This is because for Hadoop 2.5+ we overwrite existing values of `InputMetrics#bytesRead` in each call to `HadoopRDD#compute`. In the case of coalesce, e.g.
      ```
      sc.textFile(..., 4).coalesce(2).count()
      ```
      we will call `compute` multiple times in the same task, overwriting `bytesRead` values from previous calls to `compute`.
      
      For a regression test, see `InputOutputMetricsSuite.input metrics for old hadoop with coalesce`. I did not add a new regression test because it's impossible without significant refactoring; there's a lot of existing duplicate code in this corner of Spark.
      
      This was caused by #10835.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #10973 from andrewor14/fix-input-metrics-coalesce.
      12252d1d
    • Andrew Or's avatar
      [SPARK-13088] Fix DAG viz in latest version of chrome · 70e69fc4
      Andrew Or authored
      Apparently chrome removed `SVGElement.prototype.getTransformToElement`, which is used by our JS library dagre-d3 when creating edges. The real diff can be found here: https://github.com/andrewor14/dagre-d3/commit/7d6c0002e4c74b82a02c5917876576f71e215590, which is taken from the fix in the main repo: https://github.com/cpettitt/dagre-d3/commit/1ef067f1c6ad2e0980f6f0ca471bce998784b7b2
      
      Upstream issue: https://github.com/cpettitt/dagre-d3/issues/202
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #10986 from andrewor14/fix-dag-viz.
      70e69fc4
    • Andrew Or's avatar
      [SPARK-13096][TEST] Fix flaky verifyPeakExecutionMemorySet · e6ceac49
      Andrew Or authored
      Previously we would assert things before all events are guaranteed to have been processed. To fix this, just block until all events are actually processed, i.e. until the listener queue is empty.
      
      https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/79/testReport/junit/org.apache.spark.util.collection/ExternalAppendOnlyMapSuite/spilling/
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #10990 from andrewor14/accum-suite-less-flaky.
      e6ceac49
    • Reynold Xin's avatar
      [SPARK-13076][SQL] Rename ClientInterface -> HiveClient · 2cbc4128
      Reynold Xin authored
      And ClientWrapper -> HiveClientImpl.
      
      I have some followup pull requests to introduce a new internal catalog, and I think this new naming reflects better the functionality of the two classes.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #10981 from rxin/SPARK-13076.
      2cbc4128
    • Andrew Or's avatar
      [SPARK-13055] SQLHistoryListener throws ClassCastException · e38b0baa
      Andrew Or authored
      This is an existing issue uncovered recently by #10835. The reason for the exception was because the `SQLHistoryListener` gets all sorts of accumulators, not just the ones that represent SQL metrics. For example, the listener gets the `internal.metrics.shuffleRead.remoteBlocksFetched`, which is an Int, then it proceeds to cast the Int to a Long, which fails.
      
      The fix is to mark accumulators representing SQL metrics using some internal metadata. Then we can identify which ones are SQL metrics and only process those in the `SQLHistoryListener`.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #10971 from andrewor14/fix-sql-history.
      e38b0baa
    • Cheng Lian's avatar
      [SPARK-12818] Polishes spark-sketch module · 2b027e9a
      Cheng Lian authored
      Fixes various minor code and Javadoc styling issues.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #10985 from liancheng/sketch-polishing.
      2b027e9a
    • gatorsmile's avatar
      [SPARK-12656] [SQL] Implement Intersect with Left-semi Join · 5f686cc8
      gatorsmile authored
      Our current Intersect physical operator simply delegates to RDD.intersect. We should remove the Intersect physical operator and simply transform a logical intersect into a semi-join with distinct. This way, we can take advantage of all the benefits of join implementations (e.g. managed memory, code generation, broadcast joins).
      
      After a search, I found one of the mainstream RDBMS did the same. In their query explain, Intersect is replaced by Left-semi Join. Left-semi Join could help outer-join elimination in Optimizer, as shown in the PR: https://github.com/apache/spark/pull/10566
      
      Author: gatorsmile <gatorsmile@gmail.com>
      Author: xiaoli <lixiao1983@gmail.com>
      Author: Xiao Li <xiaoli@Xiaos-MacBook-Pro.local>
      
      Closes #10630 from gatorsmile/IntersectBySemiJoin.
      5f686cc8
    • Wenchen Fan's avatar
      [SPARK-13072] [SQL] simplify and improve murmur3 hash expression codegen · c5f745ed
      Wenchen Fan authored
      simplify(remove several unnecessary local variables) the generated code of hash expression, and avoid null check if possible.
      
      generated code comparison for `hash(int, double, string, array<string>)`:
      **before:**
      ```
        public UnsafeRow apply(InternalRow i) {
          /* hash(input[0, int],input[1, double],input[2, string],input[3, array<int>],42) */
          int value1 = 42;
          /* input[0, int] */
          int value3 = i.getInt(0);
          if (!false) {
            value1 = org.apache.spark.unsafe.hash.Murmur3_x86_32.hashInt(value3, value1);
          }
          /* input[1, double] */
          double value5 = i.getDouble(1);
          if (!false) {
            value1 = org.apache.spark.unsafe.hash.Murmur3_x86_32.hashLong(Double.doubleToLongBits(value5), value1);
          }
          /* input[2, string] */
          boolean isNull6 = i.isNullAt(2);
          UTF8String value7 = isNull6 ? null : (i.getUTF8String(2));
          if (!isNull6) {
            value1 = org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeBytes(value7.getBaseObject(), value7.getBaseOffset(), value7.numBytes(), value1);
          }
          /* input[3, array<int>] */
          boolean isNull8 = i.isNullAt(3);
          ArrayData value9 = isNull8 ? null : (i.getArray(3));
          if (!isNull8) {
            int result10 = value1;
            for (int index11 = 0; index11 < value9.numElements(); index11++) {
              if (!value9.isNullAt(index11)) {
                final int element12 = value9.getInt(index11);
                result10 = org.apache.spark.unsafe.hash.Murmur3_x86_32.hashInt(element12, result10);
              }
            }
            value1 = result10;
          }
        }
      ```
      **after:**
      ```
        public UnsafeRow apply(InternalRow i) {
          /* hash(input[0, int],input[1, double],input[2, string],input[3, array<int>],42) */
          int value1 = 42;
          /* input[0, int] */
          int value3 = i.getInt(0);
          value1 = org.apache.spark.unsafe.hash.Murmur3_x86_32.hashInt(value3, value1);
          /* input[1, double] */
          double value5 = i.getDouble(1);
          value1 = org.apache.spark.unsafe.hash.Murmur3_x86_32.hashLong(Double.doubleToLongBits(value5), value1);
          /* input[2, string] */
          boolean isNull6 = i.isNullAt(2);
          UTF8String value7 = isNull6 ? null : (i.getUTF8String(2));
      
          if (!isNull6) {
            value1 = org.apache.spark.unsafe.hash.Murmur3_x86_32.hashUnsafeBytes(value7.getBaseObject(), value7.getBaseOffset(), value7.numBytes(), value1);
          }
      
          /* input[3, array<int>] */
          boolean isNull8 = i.isNullAt(3);
          ArrayData value9 = isNull8 ? null : (i.getArray(3));
          if (!isNull8) {
            for (int index10 = 0; index10 < value9.numElements(); index10++) {
              final int element11 = value9.getInt(index10);
              value1 = org.apache.spark.unsafe.hash.Murmur3_x86_32.hashInt(element11, value1);
            }
          }
      
          rowWriter14.write(0, value1);
          return result12;
        }
      ```
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #10974 from cloud-fan/codegen.
      c5f745ed
    • zhuol's avatar
      [SPARK-10873] Support column sort and search for History Server. · e4c1162b
      zhuol authored
      [SPARK-10873] Support column sort and search for History Server using jQuery DataTable and REST API. Before this commit, the history server was generated hard-coded html and can not support search, also, the sorting was disabled if there is any application that has more than one attempt. Supporting search and sort (over all applications rather than the 20 entries in the current page) in any case will greatly improve user experience.
      
      1. Create the historypage-template.html for displaying application information in datables.
      2. historypage.js uses jQuery to access the data from /api/v1/applications REST API, and use DataTable to display each application's information. For application that has more than one attempt, the RowsGroup is used to merge such entries while at the same time supporting sort and search.
      3. "duration" and "lastUpdated" rest API are added to application's "attempts".
      4. External javascirpt and css files for datatables, RowsGroup and jquery plugins are added with licenses clarified.
      
      Snapshots for how it looks like now:
      
      History page view:
      ![historypage](https://cloud.githubusercontent.com/assets/11683054/12184383/89bad774-b55a-11e5-84e4-b0276172976f.png)
      
      Search:
      ![search](https://cloud.githubusercontent.com/assets/11683054/12184385/8d3b94b0-b55a-11e5-869a-cc0ef0a4242a.png)
      
      Sort by started time:
      ![sort-by-started-time](https://cloud.githubusercontent.com/assets/11683054/12184387/8f757c3c-b55a-11e5-98c8-577936366566.png)
      
      Author: zhuol <zhuol@yahoo-inc.com>
      
      Closes #10648 from zhuoliu/10873.
      e4c1162b
    • Yanbo Liang's avatar
      [SPARK-13032][ML][PYSPARK] PySpark support model export/import and take LinearRegression as example · e51b6eaa
      Yanbo Liang authored
      * Implement ```MLWriter/MLWritable/MLReader/MLReadable``` for PySpark.
      * Making ```LinearRegression``` to support ```save/load``` as example. After this merged, the work for other transformers/estimators will be easy, then we can list and distribute the tasks to the community.
      
      cc mengxr jkbradley
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #10469 from yanboliang/spark-11939.
      e51b6eaa
    • Davies Liu's avatar
      [SPARK-13031][SQL] cleanup codegen and improve test coverage · 55561e76
      Davies Liu authored
      1. enable whole stage codegen during tests even there is only one operator supports that.
      2. split doProduce() into two APIs: upstream() and doProduce()
      3. generate prefix for fresh names of each operator
      4. pass UnsafeRow to parent directly (avoid getters and create UnsafeRow again)
      5. fix bugs and tests.
      
      This PR re-open #10944 and fix the bug.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #10977 from davies/gen_refactor.
      55561e76
    • Alex Bozarth's avatar
      [SPARK-13050][BUILD] Scalatest tags fail build with the addition of the sketch module · 8d3cc3de
      Alex Bozarth authored
      A dependency on the spark test tags was left out of the sketch module pom file causing builds to fail when test tags were used. This dependency is found in the pom file for every other module in spark.
      
      Author: Alex Bozarth <ajbozart@us.ibm.com>
      
      Closes #10954 from ajbozarth/spark13050.
      8d3cc3de
    • Wenchen Fan's avatar
      [SPARK-13067] [SQL] workaround for a weird scala reflection problem · 721ced28
      Wenchen Fan authored
      A simple workaround to avoid getting parameter types when convert a
      logical plan to json.
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #10970 from cloud-fan/reflection.
      721ced28
    • Liang-Chi Hsieh's avatar
      [SPARK-12968][SQL] Implement command to set current database · 66449b8d
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-12968
      
      Implement command to set current database.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      Author: Liang-Chi Hsieh <viirya@appier.com>
      
      Closes #10916 from viirya/ddl-use-database.
      66449b8d
  3. Jan 28, 2016
    • Davies Liu's avatar
      b9dfdcc6
    • Liang-Chi Hsieh's avatar
      [SPARK-11955][SQL] Mark optional fields in merging schema for safely pushdowning filters in Parquet · 4637fc08
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-11955
      
      Currently we simply skip pushdowning filters in parquet if we enable schema merging.
      
      However, we can actually mark particular fields in merging schema for safely pushdowning filters in parquet.
      
      Author: Liang-Chi Hsieh <viirya@appier.com>
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #9940 from viirya/safe-pushdown-parquet-filters.
      4637fc08
    • Brandon Bradley's avatar
      [SPARK-12749][SQL] add json option to parse floating-point types as DecimalType · 3a40c0e5
      Brandon Bradley authored
      I tried to add this via `USE_BIG_DECIMAL_FOR_FLOATS` option from Jackson with no success.
      
      Added test for non-complex types. Should I add a test for complex types?
      
      Author: Brandon Bradley <bradleytastic@gmail.com>
      
      Closes #10936 from blbradley/spark-12749.
      3a40c0e5
    • Takeshi YAMAMURO's avatar
      [SPARK-12401][SQL] Add integration tests for postgres enum types · abae889f
      Takeshi YAMAMURO authored
      We can handle posgresql-specific enum types as strings in jdbc.
      So, we should just add tests and close the corresponding JIRA ticket.
      
      Author: Takeshi YAMAMURO <linguin.m.s@gmail.com>
      
      Closes #10596 from maropu/AddTestsInIntegration.
      abae889f
    • Yanbo Liang's avatar
      [SPARK-9835][ML] Implement IterativelyReweightedLeastSquares solver · df78a934
      Yanbo Liang authored
      Implement ```IterativelyReweightedLeastSquares``` solver for GLM. I consider it as a solver rather than estimator, it only used internal so I keep it ```private[ml]```.
      There are two limitations in the current implementation compared with R:
      * It can not support ```Tuple``` as response for ```Binomial``` family, such as the following code:
      ```
      glm( cbind(using, notUsing) ~  age + education + wantsMore , family = binomial)
      ```
      * It does not support ```offset```.
      
      Because I considered that ```RFormula``` did not support ```Tuple``` as label and ```offset``` keyword, so I simplified the implementation. But to add support for these two functions is not very hard, I can do it in follow-up PR if it is necessary. Meanwhile, we can also add R-like statistic summary for IRLS.
      The implementation refers R, [statsmodels](https://github.com/statsmodels/statsmodels) and [sparkGLM](https://github.com/AlteryxLabs/sparkGLM).
      Please focus on the main structure and overpass minor issues/docs that I will update later. Any comments and opinions will be appreciated.
      
      cc mengxr jkbradley
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #10639 from yanboliang/spark-9835.
      df78a934
    • Davies Liu's avatar
      [SPARK-13031] [SQL] cleanup codegen and improve test coverage · cc18a719
      Davies Liu authored
      1. enable whole stage codegen during tests even there is only one operator supports that.
      2. split doProduce() into two APIs: upstream() and doProduce()
      3. generate prefix for fresh names of each operator
      4. pass UnsafeRow to parent directly (avoid getters and create UnsafeRow again)
      5. fix bugs and tests.
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #10944 from davies/gen_refactor.
      cc18a719
    • Tejas Patil's avatar
      [SPARK-12926][SQL] SQLContext to display warning message when non-sql configs are being set · 67680396
      Tejas Patil authored
      Users unknowingly try to set core Spark configs in SQLContext but later realise that it didn't work. eg. sqlContext.sql("SET spark.shuffle.memoryFraction=0.4"). This PR adds a warning message when such operations are done.
      
      Author: Tejas Patil <tejasp@fb.com>
      
      Closes #10849 from tejasapatil/SPARK-12926.
      67680396
    • Cheng Lian's avatar
      [SPARK-12818][SQL] Specialized integral and string types for Count-min Sketch · 415d0a85
      Cheng Lian authored
      This PR is a follow-up of #10911. It adds specialized update methods for `CountMinSketch` so that we can avoid doing internal/external row format conversion in `DataFrame.countMinSketch()`.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #10968 from liancheng/cms-specialized.
      415d0a85
    • James Lohse's avatar
      Provide same info as in spark-submit --help · c2204436
      James Lohse authored
      this is stated for --packages and --repositories. Without stating it for --jars, people expect a standard java classpath to work, with expansion and using a different delimiter than a comma. Currently this is only state in the --help for spark-submit "Comma-separated list of local jars to include on the driver and executor classpaths."
      
      Author: James Lohse <jimlohse@users.noreply.github.com>
      
      Closes #10890 from jimlohse/patch-1.
      c2204436
  4. Jan 27, 2016
    • Nong Li's avatar
      [SPARK-13045] [SQL] Remove ColumnVector.Struct in favor of ColumnarBatch.Row · 4a091232
      Nong Li authored
      These two classes became identical as the implementation progressed.
      
      Author: Nong Li <nong@databricks.com>
      
      Closes #10952 from nongli/spark-13045.
      4a091232
    • Andrew Or's avatar
      [HOTFIX] Fix Scala 2.11 compilation · d702f0c1
      Andrew Or authored
      by explicitly marking annotated parameters as vals (SI-8813).
      
      Caused by #10835.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #10955 from andrewor14/fix-scala211.
      d702f0c1
    • Herman van Hovell's avatar
      [SPARK-12865][SPARK-12866][SQL] Migrate SparkSQLParser/ExtendedHiveQlParser commands to new Parser · ef96cd3c
      Herman van Hovell authored
      This PR moves all the functionality provided by the SparkSQLParser/ExtendedHiveQlParser to the new Parser hierarchy (SparkQl/HiveQl). This also improves the current SET command parsing: the current implementation swallows ```set role ...``` and ```set autocommit ...``` commands, this PR respects these commands (and passes them on to Hive).
      
      This PR and https://github.com/apache/spark/pull/10723 end the use of Parser-Combinator parsers for SQL parsing. As a result we can also remove the ```AbstractSQLParser``` in Catalyst.
      
      The PR is marked WIP as long as it doesn't pass all tests.
      
      cc rxin viirya winningsix (this touches https://github.com/apache/spark/pull/10144)
      
      Author: Herman van Hovell <hvanhovell@questtec.nl>
      
      Closes #10905 from hvanhovell/SPARK-12866.
      ef96cd3c
    • Wenchen Fan's avatar
      [SPARK-12938][SQL] DataFrame API for Bloom filter · 680afabe
      Wenchen Fan authored
      This PR integrates Bloom filter from spark-sketch into DataFrame. This version resorts to RDD.aggregate for building the filter. A more performant UDAF version can be built in future follow-up PRs.
      
      This PR also add 2 specify `put` version(`putBinary` and `putLong`) into `BloomFilter`, which makes it easier to build a Bloom filter over a `DataFrame`.
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #10937 from cloud-fan/bloom-filter.
      680afabe
    • Josh Rosen's avatar
      [SPARK-13021][CORE] Fail fast when custom RDDs violate RDD.partition's API contract · 32f74111
      Josh Rosen authored
      Spark's `Partition` and `RDD.partitions` APIs have a contract which requires custom implementations of `RDD.partitions` to ensure that for all `x`, `rdd.partitions(x).index == x`; in other words, the `index` reported by a repartition needs to match its position in the partitions array.
      
      If a custom RDD implementation violates this contract, then Spark has the potential to become stuck in an infinite recomputation loop when recomputing a subset of an RDD's partitions, since the tasks that are actually run will not correspond to the missing output partitions that triggered the recomputation. Here's a link to a notebook which demonstrates this problem: https://rawgit.com/JoshRosen/e520fb9a64c1c97ec985/raw/5e8a5aa8d2a18910a1607f0aa4190104adda3424/Violating%2520RDD.partitions%2520contract.html
      
      In order to guard against this infinite loop behavior, this patch modifies Spark so that it fails fast and refuses to compute RDDs' whose `partitions` violate the API contract.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10932 from JoshRosen/SPARK-13021.
      32f74111
    • Andrew Or's avatar
      [SPARK-12895][SPARK-12896] Migrate TaskMetrics to accumulators · 87abcf7d
      Andrew Or authored
      The high level idea is that instead of having the executors send both accumulator updates and TaskMetrics, we should have them send only accumulator updates. This eliminates the need to maintain both code paths since one can be implemented in terms of the other. This effort is split into two parts:
      
      **SPARK-12895: Implement TaskMetrics using accumulators.** TaskMetrics is basically just a bunch of accumulable fields. This patch makes TaskMetrics a syntactic wrapper around a collection of accumulators so we don't need to send TaskMetrics from the executors to the driver.
      
      **SPARK-12896: Send only accumulator updates to the driver.** Now that TaskMetrics are expressed in terms of accumulators, we can capture all TaskMetrics values if we just send accumulator updates from the executors to the driver. This completes the parent issue SPARK-10620.
      
      While an effort has been made to preserve as much of the public API as possible, there were a few known breaking DeveloperApi changes that would be very awkward to maintain. I will gather the full list shortly and post it here.
      
      Note: This was once part of #10717. This patch is split out into its own patch from there to make it easier for others to review. Other smaller pieces of already been merged into master.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #10835 from andrewor14/task-metrics-use-accums.
      87abcf7d
    • Jason Lee's avatar
      [SPARK-10847][SQL][PYSPARK] Pyspark - DataFrame - Optional Metadata with... · edd47375
      Jason Lee authored
      [SPARK-10847][SQL][PYSPARK] Pyspark - DataFrame - Optional Metadata with `None` triggers cryptic failure
      
      The error message is now changed from "Do not support type class scala.Tuple2." to "Do not support type class org.json4s.JsonAST$JNull$" to be more informative about what is not supported. Also, StructType metadata now handles JNull correctly, i.e., {'a': None}. test_metadata_null is added to tests.py to show the fix works.
      
      Author: Jason Lee <cjlee@us.ibm.com>
      
      Closes #8969 from jasoncl/SPARK-10847.
      edd47375
    • Josh Rosen's avatar
      [SPARK-13023][PROJECT INFRA] Fix handling of root module in modules_to_test() · 41f0c85f
      Josh Rosen authored
      There's a minor bug in how we handle the `root` module in the `modules_to_test()` function in `dev/run-tests.py`: since `root` now depends on `build` (since every test needs to run on any build test), we now need to check for the presence of root in `modules_to_test` instead of `changed_modules`.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #10933 from JoshRosen/build-module-fix.
      41f0c85f
    • Andrew's avatar
      [SPARK-1680][DOCS] Explain environment variables for running on YARN in cluster mode · 093291cf
      Andrew authored
      JIRA 1680 added a property called spark.yarn.appMasterEnv.  This PR draws users' attention to this special case by adding an explanation in configuration.html#environment-variables
      
      Author: Andrew <weiner.andrew.j@gmail.com>
      
      Closes #10869 from weineran/branch-yarn-docs.
      093291cf
    • BenFradet's avatar
      [SPARK-12983][CORE][DOC] Correct metrics.properties.template · 90b0e562
      BenFradet authored
      There are some typos or plain unintelligible sentences in the metrics template.
      
      Author: BenFradet <benjamin.fradet@gmail.com>
      
      Closes #10902 from BenFradet/SPARK-12983.
      90b0e562
  5. Jan 26, 2016
    • Xusen Yin's avatar
      [SPARK-12780] Inconsistency returning value of ML python models' properties · 4db255c7
      Xusen Yin authored
      https://issues.apache.org/jira/browse/SPARK-12780
      
      Author: Xusen Yin <yinxusen@gmail.com>
      
      Closes #10724 from yinxusen/SPARK-12780.
      4db255c7
    • Nishkam Ravi's avatar
      [SPARK-12967][NETTY] Avoid NettyRpc error message during sparkContext shutdown · bae3c9a4
      Nishkam Ravi authored
      If there's an RPC issue while sparkContext is alive but stopped (which would happen only when executing SparkContext.stop), log a warning instead. This is a common occurrence.
      
      vanzin
      
      Author: Nishkam Ravi <nishkamravi@gmail.com>
      Author: nishkamravi2 <nishkamravi@gmail.com>
      
      Closes #10881 from nishkamravi2/master_netty.
      bae3c9a4
    • Cheng Lian's avatar
      [SPARK-12728][SQL] Integrates SQL generation with native view · 58f5d8c1
      Cheng Lian authored
      This PR is a follow-up of PR #10541. It integrates the newly introduced SQL generation feature with native view to make native view canonical.
      
      In this PR, a new SQL option `spark.sql.nativeView.canonical` is added.  When this option and `spark.sql.nativeView` are both `true`, Spark SQL tries to handle `CREATE VIEW` DDL statements using SQL query strings generated from view definition logical plans. If we failed to map the plan to SQL, we fallback to the original native view approach.
      
      One important issue this PR fixes is that, now we can use CTE when defining a view.  Originally, when native view is turned on, we wrap the view definition text with an extra `SELECT`.  However, HiveQL parser doesn't allow CTE appearing as a subquery.  Namely, something like this is disallowed:
      
      ```sql
      SELECT n
      FROM (
        WITH w AS (SELECT 1 AS n)
        SELECT * FROM w
      ) v
      ```
      
      This PR fixes this issue because the extra `SELECT` is no longer needed (also, CTE expressions are inlined as subqueries during analysis phase, thus there won't be CTE expressions in the generated SQL query string).
      
      Author: Cheng Lian <lian@databricks.com>
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #10733 from liancheng/spark-12728.integrate-sql-gen-with-native-view.
      58f5d8c1
    • Cheng Lian's avatar
      [SPARK-12935][SQL] DataFrame API for Count-Min Sketch · ce38a35b
      Cheng Lian authored
      This PR integrates Count-Min Sketch from spark-sketch into DataFrame. This version resorts to `RDD.aggregate` for building the sketch. A more performant UDAF version can be built in future follow-up PRs.
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #10911 from liancheng/cms-df-api.
      ce38a35b
Loading