Skip to content
Snippets Groups Projects
  1. Mar 06, 2015
    • Vinod K C's avatar
      [SPARK-6178][Shuffle] Removed unused imports · dba0b2ea
      Vinod K C authored
      Author: Vinod K C <vinod.kchuawei.com>
      
      Author: Vinod K C <vinod.kc@huawei.com>
      
      Closes #4900 from vinodkc/unused_imports and squashes the following commits:
      
      5373456 [Vinod K C] Removed empty lines
      9da7438 [Vinod K C] Changed order of import
      594d471 [Vinod K C] Removed unused imports
      dba0b2ea
    • GuoQiang Li's avatar
      [Minor] Resolve sbt warnings: postfix operator second should be enabled · 05cb6b34
      GuoQiang Li authored
      Resolve sbt warnings:
      
      ```
      [warn] spark/streaming/src/main/scala/org/apache/spark/streaming/util/WriteAheadLogManager.scala:155: postfix operator second should be enabled
      [warn] by making the implicit value scala.language.postfixOps visible.
      [warn] This can be achieved by adding the import clause 'import scala.language.postfixOps'
      [warn] or by setting the compiler option -language:postfixOps.
      [warn] See the Scala docs for value scala.language.postfixOps for a discussion
      [warn] why the feature should be explicitly enabled.
      [warn]         Await.ready(f, 1 second)
      [warn]                          ^
      ```
      
      Author: GuoQiang Li <witgo@qq.com>
      
      Closes #4908 from witgo/sbt_warnings and squashes the following commits:
      
      0629af4 [GuoQiang Li] Resolve sbt warnings: postfix operator second should be enabled
      05cb6b34
    • Marcelo Vanzin's avatar
      [core] [minor] Don't pollute source directory when running UtilsSuite. · cd7594ca
      Marcelo Vanzin authored
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      
      Closes #4921 from vanzin/utils-suite and squashes the following commits:
      
      7795dd4 [Marcelo Vanzin] [core] [minor] Don't pollute source directory when running UtilsSuite.
      cd7594ca
    • Zhang, Liye's avatar
      [CORE, DEPLOY][minor] align arguments order with docs of worker · d8b3da9d
      Zhang, Liye authored
      The help message for starting `worker` is `Usage: Worker [options] <master>`. While in `start-slaves.sh`, the format is not align with that, it is confusing for the fist glance.
      
      Author: Zhang, Liye <liye.zhang@intel.com>
      
      Closes #4924 from liyezhang556520/startSlaves and squashes the following commits:
      
      7fd5deb [Zhang, Liye] align arguments order with docs of worker
      d8b3da9d
  2. Mar 05, 2015
    • Michael Armbrust's avatar
      [SQL] Make Strategies a public developer API · eb48fd6e
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #4920 from marmbrus/openStrategies and squashes the following commits:
      
      cbc35c0 [Michael Armbrust] [SQL] Make Strategies a public developer API
      eb48fd6e
    • Yin Huai's avatar
      [SPARK-6163][SQL] jsonFile should be backed by the data source API · 1b4bb25c
      Yin Huai authored
      jira: https://issues.apache.org/jira/browse/SPARK-6163
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #4896 from yhuai/SPARK-6163 and squashes the following commits:
      
      45e023e [Yin Huai] Address @chenghao-intel's comment.
      2e8734e [Yin Huai] Use JSON data source for jsonFile.
      92a4a33 [Yin Huai] Test.
      1b4bb25c
    • Wenchen Fan's avatar
      [SPARK-6145][SQL] fix ORDER BY on nested fields · 5873c713
      Wenchen Fan authored
      Based on #4904 with style errors fixed.
      
      `LogicalPlan#resolve` will not only produce `Attribute`, but also "`GetField` chain".
      So in `ResolveSortReferences`, after resolve the ordering expressions, we should not just collect the `Attribute` results, but also `Attribute` at the bottom of "`GetField` chain".
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #4918 from marmbrus/pr/4904 and squashes the following commits:
      
      997f84e [Michael Armbrust] fix style
      3eedbfc [Wenchen Fan] fix 6145
      5873c713
    • Josh Rosen's avatar
      [SPARK-6175] Fix standalone executor log links when ephemeral ports or SPARK_PUBLIC_DNS are used · 424a86a1
      Josh Rosen authored
      This patch fixes two issues with the executor log viewing links added in Spark 1.3.  In standalone mode, the log URLs might include a port value of 0 rather than the actual bound port of the UI, which broke the ability to view logs from workers whose web UIs had been configured to bind to ephemeral ports.  In addition, the URLs used workers' local hostnames instead of respecting SPARK_PUBLIC_DNS, which prevented this feature from working properly on Spark EC2 clusters because the links would point to internal DNS names instead of external ones.
      
      I included tests for both of these bugs:
      
      - We now browse to the URLs and verify that they point to the expected pages.
      - To test SPARK_PUBLIC_DNS, I changed the code that reads the environment variable to do so via `SparkConf.getenv`, then used a custom SparkConf subclass to mock the environment variable (this pattern is used elsewhere in Spark's tests).
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #4903 from JoshRosen/SPARK-6175 and squashes the following commits:
      
      5577f41 [Josh Rosen] Remove println
      cfec135 [Josh Rosen] Use webUi.boundPort and publicAddress in log links
      27918c7 [Josh Rosen] Add failing unit tests for standalone log URL viewing
      c250fbe [Josh Rosen] Respect SparkConf in local-cluster Workers.
      422a2ef [Josh Rosen] Use conf.getenv to read SPARK_PUBLIC_DNS
      424a86a1
    • Xiangrui Meng's avatar
      [SPARK-6090][MLLIB] add a basic BinaryClassificationMetrics to PySpark/MLlib · 0bfacd5c
      Xiangrui Meng authored
      A simple wrapper around the Scala implementation. `DataFrame` is used for serialization/deserialization. Methods that return `RDD`s are not supported in this PR.
      
      davies If we recognize Scala's `Product`s in Py4J, we can easily add wrappers for Scala methods that returns `RDD[(Double, Double)]`. Is it easy to register serializer for `Product` in PySpark?
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #4863 from mengxr/SPARK-6090 and squashes the following commits:
      
      009a3a3 [Xiangrui Meng] provide schema
      dcddab5 [Xiangrui Meng] add a basic BinaryClassificationMetrics to PySpark/MLlib
      0bfacd5c
    • Sean Owen's avatar
      SPARK-6182 [BUILD] spark-parent pom needs to be published for both 2.10 and 2.11 · c9cfba0c
      Sean Owen authored
      Option 1 of 2: Convert spark-parent module name to spark-parent_2.10 / spark-parent_2.11
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #4912 from srowen/SPARK-6182.1 and squashes the following commits:
      
      eff60de [Sean Owen] Convert spark-parent module name to spark-parent_2.10 / spark-parent_2.11
      c9cfba0c
    • Daoyuan Wang's avatar
      [SPARK-6153] [SQL] promote guava dep for hive-thriftserver · e06c7dfb
      Daoyuan Wang authored
      For package thriftserver, guava is used at runtime.
      
      /cc pwendell
      
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      
      Closes #4884 from adrian-wang/test and squashes the following commits:
      
      4600ae7 [Daoyuan Wang] only promote for thriftserver
      44dda18 [Daoyuan Wang] promote guava dep for hive
      e06c7dfb
  3. Mar 04, 2015
    • Sean Owen's avatar
      SPARK-5143 [BUILD] [WIP] spark-network-yarn 2.11 depends on spark-network-shuffle 2.10 · 7ac072f7
      Sean Owen authored
      Update `<scala.binary.version>` prop in POM when switching between Scala 2.10/2.11
      
      ScrapCodes for review. This `sed` command is supposed to just replace the first occurrence, but it replaces them all. Are you more of a `sed` wizard than I? It may be a GNU/BSD thing that is throwing me off. Really, just the first instance should be replaced, hence the `[WIP]`.
      
      NB on OS X the original `sed` command here will create files like `pom.xml-e` through the source tree though it otherwise works. It's like `-e` is also the arg to `-i`. I couldn't get rid of that even with `-i""`. No biggie.
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #4876 from srowen/SPARK-5143 and squashes the following commits:
      
      b060c44 [Sean Owen] Oops, fixed reversed version numbers!
      e875d4a [Sean Owen] Add note about non-GNU sed; fix new pom.xml update to work as intended on GNU sed
      703e1eb [Sean Owen] Update scala.binary.version prop in POM when switching between Scala 2.10/2.11
      7ac072f7
    • Cheng Lian's avatar
      [SPARK-6149] [SQL] [Build] Excludes Guava 15 referenced by jackson-module-scala_2.10 · 1aa90e39
      Cheng Lian authored
      This PR excludes Guava 15.0 from the SBT build, to make Spark SQL CLI (`bin/spark-sql`) work when compiled against Hive 0.12.0.
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4890)
      <!-- Reviewable:end -->
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #4890 from liancheng/exclude-guava-15 and squashes the following commits:
      
      91ae9fa [Cheng Lian] Moves Guava 15 exclusion from SBT build to POM
      282bd2a [Cheng Lian] Excludes Guava 15 referenced by jackson-module-scala_2.10
      1aa90e39
    • Marcelo Vanzin's avatar
      [SPARK-6144] [core] Fix addFile when source files are on "hdfs:" · 3a35a0df
      Marcelo Vanzin authored
      The code failed in two modes: it complained when it tried to re-create a directory that already existed, and it was placing some files in the wrong parent directory. The patch fixes both issues.
      
      Author: Marcelo Vanzin <vanzin@cloudera.com>
      Author: trystanleftwich <trystan@atscale.com>
      
      Closes #4894 from vanzin/SPARK-6144 and squashes the following commits:
      
      100b3a1 [Marcelo Vanzin] Style fix.
      58266aa [Marcelo Vanzin] Fix fetchHcfs file for directories.
      91733b7 [trystanleftwich] [SPARK-6144]When in cluster mode using ADD JAR with a hdfs:// sourced jar will fail
      3a35a0df
    • Zhang, Liye's avatar
      [SPARK-6107][CORE] Display inprogress application information for event log... · f6773edc
      Zhang, Liye authored
      [SPARK-6107][CORE] Display inprogress application information for event log history for standalone mode
      
      when application is finished running abnormally (Ctrl + c for example), the history event log file is still ends with `.inprogress` suffix. And the application state can not be showed on webUI, User can only see "*Application history not foud xxxx, Application xxx is still in progress*".
      
      For application that not finished normally, the history will show:
      ![image](https://cloud.githubusercontent.com/assets/4716022/6437137/184f9fc0-c0f5-11e4-88cc-a2eb087e4561.png)
      
      Author: Zhang, Liye <liye.zhang@intel.com>
      
      Closes #4848 from liyezhang556520/showLogInprogress and squashes the following commits:
      
      03589ac [Zhang, Liye] change inprogress to in progress
      b55f19f [Zhang, Liye] scala modify after rebase
      8aa66a2 [Zhang, Liye] use softer wording
      b030bd4 [Zhang, Liye] clean code
      79c8cb1 [Zhang, Liye] fix some mistakes
      11cdb68 [Zhang, Liye] add a missing space
      c29205b [Zhang, Liye] refine code according to sean owen's comments
      e9952a7 [Zhang, Liye] scala style fix again
      150502d [Zhang, Liye] scala style fix
      f11a5da [Zhang, Liye] small fix for file path
      22e878b [Zhang, Liye] enable in progress eventlog file
      f6773edc
    • Liang-Chi Hsieh's avatar
      [SPARK-6134][SQL] Fix wrong datatype for casting FloatType and default... · aef8a84e
      Liang-Chi Hsieh authored
      [SPARK-6134][SQL] Fix wrong datatype for casting FloatType and default LongType value in defaultPrimitive
      
      In `CodeGenerator`, the casting on `FloatType` should use `FloatType` instead of `IntegerType`.
      
      Besides, `defaultPrimitive` for `LongType` should be `-1L` instead of `1L`.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #4870 from viirya/codegen_type and squashes the following commits:
      
      76311dd [Liang-Chi Hsieh] Fix wrong datatype for casting on FloatType. Fix the wrong value for LongType in defaultPrimitive.
      aef8a84e
    • Cheng Lian's avatar
      [SPARK-6136] [SQL] Removed JDBC integration tests which depends on docker-client · 76b472f1
      Cheng Lian authored
      Integration test suites in the JDBC data source (`MySQLIntegration` and `PostgresIntegration`) depend on docker-client 2.7.5, which transitively depends on Guava 17.0. Unfortunately, Guava 17.0 is causing test runtime binary compatibility issues when Spark is compiled against Hive 0.12.0, or Hadoop 2.4.
      
      Considering `MySQLIntegration` and `PostgresIntegration` are ignored right now, I'd suggest moving them from the Spark project to the [Spark integration tests] [1] project. This PR removes both the JDBC data source integration tests and the docker-client test dependency.
      
      [1]: |https://github.com/databricks/spark-integration-tests
      
      <!-- Reviewable:start -->
      [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4872)
      <!-- Reviewable:end -->
      
      Author: Cheng Lian <lian@databricks.com>
      
      Closes #4872 from liancheng/remove-docker-client and squashes the following commits:
      
      1f4169e [Cheng Lian] Removes DockerHacks
      159b24a [Cheng Lian] Removed JDBC integration tests which depends on docker-client
      76b472f1
    • Brennon York's avatar
      [SPARK-3355][Core]: Allow running maven tests in run-tests · 418f38d9
      Brennon York authored
      Added an AMPLAB_JENKINS_BUILD_TOOL env. variable to allow differentiation between maven and sbt build / test suites. The only issue I found with this is that, when running maven builds I wasn't able to get individual package tests running without running a `mvn install` first. Not sure what Jenkins is doing wrt its env., but figured its much better to just test everything than install packages in the "~/.m2/" directory and only test individual items, esp. if this is predominantly for the Jenkins build. Thoughts / comments would be great!
      
      Author: Brennon York <brennon.york@capitalone.com>
      
      Closes #4734 from brennonyork/SPARK-3355 and squashes the following commits:
      
      c813d32 [Brennon York] changed mvn call from 'clean compile
      616ce30 [Brennon York] fixed merge conflicts
      3540de9 [Brennon York] added an AMPLAB_JENKINS_BUILD_TOOL env. variable to allow differentiation between maven and sbt build / test suites
      418f38d9
    • tedyu's avatar
      SPARK-6085 Increase default value for memory overhead · 8d3e2414
      tedyu authored
      Author: tedyu <yuzhihong@gmail.com>
      
      Closes #4836 from tedyu/master and squashes the following commits:
      
      d65b495 [tedyu] SPARK-6085 Increase default value for memory overhead
      1fdd4df [tedyu] SPARK-6085 Increase default value for memory overhead
      8d3e2414
    • Xiangrui Meng's avatar
      [SPARK-6141][MLlib] Upgrade Breeze from 0.10 to 0.11 to fix convergence bug · 76e20a0a
      Xiangrui Meng authored
      LBFGS and OWLQN in Breeze 0.10 has convergence check bug.
      This is fixed in 0.11, see the description in Breeze project for detail:
      
      https://github.com/scalanlp/breeze/pull/373#issuecomment-76879760
      
      Author: Xiangrui Meng <meng@databricks.com>
      Author: DB Tsai <dbtsai@alpinenow.com>
      Author: DB Tsai <dbtsai@dbtsai.com>
      
      Closes #4879 from dbtsai/breeze and squashes the following commits:
      
      d848f65 [DB Tsai] Merge pull request #1 from mengxr/AlpineNow-breeze
      c2ca6ac [Xiangrui Meng] upgrade to breeze-0.11.1
      35c2f26 [Xiangrui Meng] fix LRSuite
      397a208 [DB Tsai] upgrade breeze
      76e20a0a
  4. Mar 03, 2015
    • Andrew Or's avatar
      [SPARK-6132][HOTFIX] ContextCleaner InterruptedException should be quiet · d334bfbc
      Andrew Or authored
      If the cleaner is stopped, we shouldn't print a huge stack trace when the cleaner thread is interrupted because we purposefully did this.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #4882 from andrewor14/cleaner-interrupt and squashes the following commits:
      
      8652120 [Andrew Or] Just a hot fix
      d334bfbc
    • Imran Rashid's avatar
      [SPARK-5949] HighlyCompressedMapStatus needs more classes registered w/ kryo · 1f1fccc5
      Imran Rashid authored
      https://issues.apache.org/jira/browse/SPARK-5949
      
      Author: Imran Rashid <irashid@cloudera.com>
      
      Closes #4877 from squito/SPARK-5949_register_roaring_bitmap and squashes the following commits:
      
      7e13316 [Imran Rashid] style style style
      5f6bb6d [Imran Rashid] more style
      709bfe0 [Imran Rashid] style
      a5cb744 [Imran Rashid] update tests to cover both types of RoaringBitmapContainers
      09610c6 [Imran Rashid] formatting
      f9a0b7c [Imran Rashid] put primitive array registrations together
      97beaf8 [Imran Rashid] SPARK-5949 HighlyCompressedMapStatus needs more classes registered w/ kryo
      1f1fccc5
    • Andrew Or's avatar
      [SPARK-6133] Make sc.stop() idempotent · 6c20f352
      Andrew Or authored
      Before we would get the following (benign) error if we called `sc.stop()` twice. This is because the listener bus would try to post the end event again even after it has already stopped. This happens occasionally when flaky tests fail, usually as a result of other sources of error. Either way we shouldn't be logging this error when it is not the cause of the failure.
      ```
      ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerApplicationEnd(1425348445682)
      ```
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #4871 from andrewor14/sc-stop and squashes the following commits:
      
      a14afc5 [Andrew Or] Move code after code
      915db16 [Andrew Or] Move code into code
      6c20f352
    • Andrew Or's avatar
      [SPARK-6132] ContextCleaner race condition across SparkContexts · fe63e822
      Andrew Or authored
      The problem is that `ContextCleaner` may clean variables that belong to a different `SparkContext`. This can happen if the `SparkContext` to which the cleaner belongs stops, and a new one is started immediately afterwards in the same JVM. In this case, if the cleaner is in the middle of cleaning a broadcast, for instance, it will do so through `SparkEnv.get.blockManager`, which could be one that belongs to a different `SparkContext`.
      
      JoshRosen and I suspect that this is the cause of many flaky tests, most notably the `JavaAPISuite`. We were able to reproduce the failure locally (though it is not deterministic and very hard to reproduce).
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #4869 from andrewor14/cleaner-masquerade and squashes the following commits:
      
      29168c0 [Andrew Or] Synchronize ContextCleaner stop
      fe63e822
    • Sean Owen's avatar
      SPARK-1911 [DOCS] Warn users if their assembly jars are not built with Java 6 · e750a6bf
      Sean Owen authored
      Add warning about building with Java 7+ and running the JAR on early Java 6.
      
      CC andrewor14
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #4874 from srowen/SPARK-1911 and squashes the following commits:
      
      79fa2f6 [Sean Owen] Add warning about building with Java 7+ and running the JAR on early Java 6.
      e750a6bf
    • Andrew Or's avatar
      Revert "[SPARK-5423][Core] Cleanup resources in DiskMapIterator.finalize to... · 9af00174
      Andrew Or authored
      Revert "[SPARK-5423][Core] Cleanup resources in DiskMapIterator.finalize to ensure deleting the temp file"
      
      This reverts commit 90095bf3.
      9af00174
    • Wenchen Fan's avatar
      [SPARK-6138][CORE][minor] enhance the `toArray` method in `SizeTrackingVector` · e359794c
      Wenchen Fan authored
      Use array copy instead of `Iterator#toArray` to make it more efficient.
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #4825 from cloud-fan/minor and squashes the following commits:
      
      c933ee5 [Wenchen Fan] make toArray method just in parent
      946a35b [Wenchen Fan] minor enhance
      e359794c
    • CodingCat's avatar
      [SPARK-6118] making package name of deploy.worker.CommandUtils and... · 975643c2
      CodingCat authored
      [SPARK-6118] making package name of deploy.worker.CommandUtils and deploy.CommandUtilsSuite consistent
      
      https://issues.apache.org/jira/browse/SPARK-6118
      
      I found that the object CommandUtils is placed under deploy.worker package, while CommandUtilsSuite is  under deploy
      
      Conventionally, we put the implementation and unit test class under the same package
      
      here, to minimize the change, I move CommandUtilsSuite to worker package,
      
      **However, CommandUtils seems to contain some general methods (though only used by worker.* classes currently**,  we may also consider to replace CommonUtils
      
      Author: CodingCat <zhunansjtu@gmail.com>
      
      Closes #4856 from CodingCat/SPARK-6118 and squashes the following commits:
      
      cb93700 [CodingCat] making package name consistent
      975643c2
    • Patrick Wendell's avatar
      BUILD: Minor tweaks to internal build scripts · 0c9a8eae
      Patrick Wendell authored
      This adds two features:
      1. The ability to publish with a different maven version than
         that specified in the release source.
      2. Forking of different Zinc instances during the parallel dist
         creation (to help with some stability issues).
      0c9a8eae
    • Patrick Wendell's avatar
      HOTFIX: Bump HBase version in MapR profiles. · 165ff364
      Patrick Wendell authored
      After #2982 (SPARK-4048) we rely on the newer HBase packaging format.
      165ff364
    • DB Tsai's avatar
      [SPARK-5537][MLlib][Docs] Add user guide for multinomial logistic regression · b1960561
      DB Tsai authored
      Adding more description on top of #4861.
      
      Author: DB Tsai <dbtsai@alpinenow.com>
      
      Closes #4866 from dbtsai/doc and squashes the following commits:
      
      37e9d07 [DB Tsai] doc
      b1960561
    • Joseph K. Bradley's avatar
      [SPARK-6120] [mllib] Warnings about memory in tree, ensemble model save · c2fe3a6f
      Joseph K. Bradley authored
      Issue: When the Python DecisionTree example in the programming guide is run, it runs out of Java Heap Space when using the default memory settings for the spark shell.
      
      This prints a warning.
      
      CC: mengxr
      
      Author: Joseph K. Bradley <joseph@databricks.com>
      
      Closes #4864 from jkbradley/dt-save-heap and squashes the following commits:
      
      02e8daf [Joseph K. Bradley] fixed based on code review
      7ecb1ed [Joseph K. Bradley] Added warnings about memory when calling tree and ensemble model save with too small a Java heap size
      c2fe3a6f
    • Xiangrui Meng's avatar
      [SPARK-6097][MLLIB] Support tree model save/load in PySpark/MLlib · 7e53a79c
      Xiangrui Meng authored
      Similar to `MatrixFactorizaionModel`, we only need wrappers to support save/load for tree models in Python.
      
      jkbradley
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #4854 from mengxr/SPARK-6097 and squashes the following commits:
      
      4586a4d [Xiangrui Meng] fix more typos
      8ebcac2 [Xiangrui Meng] fix python style
      91172d8 [Xiangrui Meng] fix typos
      201b3b9 [Xiangrui Meng] update user guide
      b5158e2 [Xiangrui Meng] support tree model save/load in PySpark/MLlib
      7e53a79c
    • Reynold Xin's avatar
      [SPARK-5310][SQL] Fixes to Docs and Datasources API · 54d19689
      Reynold Xin authored
       - Various Fixes to docs
       - Make data source traits actually interfaces
      
      Based on #4862 but with fixed conflicts.
      
      Author: Reynold Xin <rxin@databricks.com>
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #4868 from marmbrus/pr/4862 and squashes the following commits:
      
      fe091ea [Michael Armbrust] Merge remote-tracking branch 'origin/master' into pr/4862
      0208497 [Reynold Xin] Test fixes.
      34e0a28 [Reynold Xin] [SPARK-5310][SQL] Various fixes to Spark SQL docs.
      54d19689
  5. Mar 02, 2015
    • Yin Huai's avatar
      [SPARK-5950][SQL]Insert array into a metastore table saved as parquet should... · 12599942
      Yin Huai authored
      [SPARK-5950][SQL]Insert array into a metastore table saved as parquet should work when using datasource api
      
      This PR contains the following changes:
      1. Add a new method, `DataType.equalsIgnoreCompatibleNullability`, which is the middle ground between DataType's equality check and `DataType.equalsIgnoreNullability`. For two data types `from` and `to`, it does `equalsIgnoreNullability` as well as if the nullability of `from` is compatible with that of `to`. For example, the nullability of `ArrayType(IntegerType, containsNull = false)` is compatible with that of `ArrayType(IntegerType, containsNull = true)` (for an array without null values, we can always say it may contain null values). However,  the nullability of `ArrayType(IntegerType, containsNull = true)` is incompatible with that of `ArrayType(IntegerType, containsNull = false)` (for an array that may have null values, we cannot say it does not have null values).
      2. For the `resolved` field of `InsertIntoTable`, use `equalsIgnoreCompatibleNullability` to replace the equality check of the data types.
      3. For our data source write path, when appending data, we always use the schema of existing table to write the data. This is important for parquet, since nullability direct impacts the way to encode/decode values. If we do not do this, we may see corrupted values when reading values from a set of parquet files generated with different nullability settings.
      4. When generating a new parquet table, we always set nullable/containsNull/valueContainsNull to true. So, we will not face situations that we cannot append data because containsNull/valueContainsNull in an Array/Map column of the existing table has already been set to `false`. This change makes the whole data pipeline more robust.
      5. Update the equality check of JSON relation. Since JSON does not really cares nullability,  `equalsIgnoreNullability` seems a better choice to compare schemata from to JSON tables.
      
      JIRA: https://issues.apache.org/jira/browse/SPARK-5950
      
      Thanks viirya for the initial work in #4729.
      
      cc marmbrus liancheng
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #4826 from yhuai/insertNullabilityCheck and squashes the following commits:
      
      3b61a04 [Yin Huai] Revert change on equals.
      80e487e [Yin Huai] asNullable in UDT.
      587d88b [Yin Huai] Make methods private.
      0cb7ea2 [Yin Huai] marmbrus's comments.
      3cec464 [Yin Huai] Cheng's comments.
      486ed08 [Yin Huai] Merge remote-tracking branch 'upstream/master' into insertNullabilityCheck
      d3747d1 [Yin Huai] Remove unnecessary change.
      8360817 [Yin Huai] Merge remote-tracking branch 'upstream/master' into insertNullabilityCheck
      8a3f237 [Yin Huai] Use equalsIgnoreNullability instead of equality check.
      0eb5578 [Yin Huai] Fix tests.
      f6ed813 [Yin Huai] Update old parquet path.
      e4f397c [Yin Huai] Unit tests.
      b2c06f8 [Yin Huai] Ignore nullability in JSON relation's equality check.
      8bd008b [Yin Huai] nullable, containsNull, and valueContainsNull will be always true for parquet data.
      bf50d73 [Yin Huai] When appending data, we use the schema of the existing table instead of the schema of the new data.
      0a703e7 [Yin Huai] Test failed again since we cannot read correct content.
      9a26611 [Yin Huai] Make InsertIntoTable happy.
      8f19fe5 [Yin Huai] equalsIgnoreCompatibleNullability
      4ec17fd [Yin Huai] Failed test.
      12599942
    • Tathagata Das's avatar
      [SPARK-6127][Streaming][Docs] Add Kafka to Python api docs · 9eb22ece
      Tathagata Das authored
      davies
      
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #4860 from tdas/SPARK-6127 and squashes the following commits:
      
      82de92a [Tathagata Das] Add Kafka to Python api docs
      9eb22ece
    • Xiangrui Meng's avatar
      [SPARK-5537] Add user guide for multinomial logistic regression · 9d6c5aee
      Xiangrui Meng authored
      This is based on #4801 from dbtsai. The linear method guide is re-organized a little bit for this change.
      
      Closes #4801
      
      Author: Xiangrui Meng <meng@databricks.com>
      Author: DB Tsai <dbtsai@alpinenow.com>
      
      Closes #4861 from mengxr/SPARK-5537 and squashes the following commits:
      
      47af0ac [Xiangrui Meng] update user guide for multinomial logistic regression
      cdc2e15 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into AlpineNow-mlor-doc
      096d0ca [DB Tsai] first commit
      9d6c5aee
    • Xiangrui Meng's avatar
      [SPARK-6121][SQL][MLLIB] simpleString for UDT · 2db6a853
      Xiangrui Meng authored
      `df.dtypes` shows `null` for UDTs. This PR uses `udt` by default and `VectorUDT` overwrites it with `vector`.
      
      jkbradley davies
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #4858 from mengxr/SPARK-6121 and squashes the following commits:
      
      34f0a77 [Xiangrui Meng] simpleString for UDT
      2db6a853
    • hushan[胡珊]'s avatar
      [SPARK-4777][CORE] Some block memory after unrollSafely not count into used... · e3a88d11
      hushan[胡珊] authored
      [SPARK-4777][CORE] Some block memory after unrollSafely not count into used memory(memoryStore.entrys or unrollMemory)
      
      Some memory not count into memory used by memoryStore or unrollMemory.
      Thread A after unrollsafely memory, it will release 40MB unrollMemory(40MB will used by other threads). then ThreadA wait get accountingLock to tryToPut blockA(30MB). before Thread A get accountingLock, blockA memory size is not counting into unrollMemory or memoryStore.currentMemory.
      IIUC, freeMemory should minus that block memory
      
      So, put this release memory into pending, and release it in tryToPut before ensureSpace
      
      Author: hushan[胡珊] <hushan@xiaomi.com>
      
      Closes #3629 from suyanNone/unroll-memory and squashes the following commits:
      
      809cc41 [hushan[胡珊]] Refine
      407b2c9 [hushan[胡珊]] Refine according comments
      39960d0 [hushan[胡珊]] Refine comments
      0fd0213 [hushan[胡珊]] add comments
      0fc2bec [hushan[胡珊]] Release pending unroll memory after put block in memoryStore
      3a3f2c8 [hushan[胡珊]] Refine blockManagerSuite unroll test
      3323c45 [hushan[胡珊]] Refine getOrElse
      f664317 [hushan[胡珊]] Make sure not add pending in every releaseUnrollMemory call
      08b32ba [hushan[胡珊]] Pending unroll memory for this block untill tryToPut
      e3a88d11
    • Andrew Or's avatar
      [SPARK-6048] SparkConf should not translate deprecated configs on set · 258d154c
      Andrew Or authored
      There are multiple issues with translating on set outlined in the JIRA.
      
      This PR reverts the translation logic added to `SparkConf`. In the future, after the 1.3.0 release we will figure out a way to reorganize the internal structure more elegantly. For now, let's preserve the existing semantics of `SparkConf` since it's a public interface. Unfortunately this means duplicating some code for now, but this is all internal and we can always clean it up later.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #4799 from andrewor14/conf-set-translate and squashes the following commits:
      
      11c525b [Andrew Or] Move warning to driver
      10e77b5 [Andrew Or] Add documentation for deprecation precedence
      a369cb1 [Andrew Or] Merge branch 'master' of github.com:apache/spark into conf-set-translate
      c26a9e3 [Andrew Or] Revert all translate logic in SparkConf
      fef6c9c [Andrew Or] Restore deprecation logic for spark.executor.userClassPathFirst
      94b4dfa [Andrew Or] Translate on get, not set
      258d154c
Loading