Skip to content
Snippets Groups Projects
  1. Jul 11, 2014
    • Andrew Or's avatar
      [Minor] Remove unused val in Master · f4f46dec
      Andrew Or authored
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #1365 from andrewor14/master-fs and squashes the following commits:
      
      497f100 [Andrew Or] Sneak in a space and hope no one will notice
      05ba6da [Andrew Or] Remove unused val
      f4f46dec
    • CrazyJvm's avatar
      fix Graph partitionStrategy comment · 282cca0e
      CrazyJvm authored
      Author: CrazyJvm <crazyjvm@gmail.com>
      
      Closes #1368 from CrazyJvm/graph-comment-1 and squashes the following commits:
      
      d47f3c5 [CrazyJvm] fix style
      e190d6f [CrazyJvm] fix Graph partitionStrategy comment
      282cca0e
  2. Jul 10, 2014
    • Xiangrui Meng's avatar
      [SPARK-2358][MLLIB] Add an option to include native BLAS/LAPACK loader in the build · 2f59ce7d
      Xiangrui Meng authored
      It would be easy for users to include the netlib-java jniloader in the spark jar, which is LGPL-licensed. We can follow the same approach as ganglia support in Spark, which could be enabled by turning on "-Pganglia-lgpl" at build time. We can use "-Pnetlib-lgpl" flag for this.
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #1295 from mengxr/netlib-lgpl and squashes the following commits:
      
      aebf001 [Xiangrui Meng] add a profile to optionally include native BLAS/LAPACK loader in mllib
      2f59ce7d
    • Takuya UESHIN's avatar
      [SPARK-2428][SQL] Add except and intersect methods to SchemaRDD. · 10b59ba2
      Takuya UESHIN authored
      Author: Takuya UESHIN <ueshin@happy-camper.st>
      
      Closes #1355 from ueshin/issues/SPARK-2428 and squashes the following commits:
      
      b6fa264 [Takuya UESHIN] Add except and intersect methods to SchemaRDD.
      10b59ba2
    • Takuya UESHIN's avatar
      [SPARK-2415] [SQL] RowWriteSupport should handle empty ArrayType correctly. · f5abd271
      Takuya UESHIN authored
      `RowWriteSupport` doesn't write empty `ArrayType` value, so the read value becomes `null`.
      It should write empty `ArrayType` value as it is.
      
      Author: Takuya UESHIN <ueshin@happy-camper.st>
      
      Closes #1339 from ueshin/issues/SPARK-2415 and squashes the following commits:
      
      32afc87 [Takuya UESHIN] Merge branch 'master' into issues/SPARK-2415
      2f05196 [Takuya UESHIN] Fix RowWriteSupport to handle empty ArrayType correctly.
      f5abd271
    • Takuya UESHIN's avatar
      [SPARK-2431][SQL] Refine StringComparison and related codes. · f62c4272
      Takuya UESHIN authored
      Refine `StringComparison` and related codes as follows:
      - `StringComparison` could be similar to `StringRegexExpression` or `CaseConversionExpression`.
      - Nullability of `StringRegexExpression` could depend on children's nullabilities.
      - Add a case that the like condition includes no wildcard to `LikeSimplification`.
      
      Author: Takuya UESHIN <ueshin@happy-camper.st>
      
      Closes #1357 from ueshin/issues/SPARK-2431 and squashes the following commits:
      
      77766f5 [Takuya UESHIN] Add a case that the like condition includes no wildcard to LikeSimplification.
      b9da9d2 [Takuya UESHIN] Fix nullability of StringRegexExpression.
      680bb72 [Takuya UESHIN] Refine StringComparison.
      f62c4272
    • Artjom-Metro's avatar
      SPARK-2427: Fix Scala examples that use the wrong command line arguments index · ae8ca4df
      Artjom-Metro authored
      The Scala examples HBaseTest and HdfsTest don't use the correct indexes for the command line arguments. This due to to the fix of JIRA 1565, where these examples were not correctly adapted to the new usage of the submit script.
      
      Author: Artjom-Metro <Artjom-Metro@users.noreply.github.com>
      Author: Artjom-Metro <artjom31415@googlemail.com>
      
      Closes #1353 from Artjom-Metro/fix_examples and squashes the following commits:
      
      6111801 [Artjom-Metro] Reduce the default number of iterations
      cfaa73c [Artjom-Metro] Fix some examples that use the wrong index to access the command line arguments
      ae8ca4df
    • Issac Buenrostro's avatar
      [SPARK-1341] [Streaming] Throttle BlockGenerator to limit rate of data consumption. · 2dd67248
      Issac Buenrostro authored
      Author: Issac Buenrostro <buenrostro@ooyala.com>
      
      Closes #945 from ibuenros/SPARK-1341-throttle and squashes the following commits:
      
      5514916 [Issac Buenrostro] Formatting changes, added documentation for streaming throttling, stricter unit tests for throttling.
      62f395f [Issac Buenrostro] Add comments and license to streaming RateLimiter.scala
      7066438 [Issac Buenrostro] Moved throttle code to RateLimiter class, smoother pushing when throttling active
      ccafe09 [Issac Buenrostro] Throttle BlockGenerator to limit rate of data consumption.
      2dd67248
    • tmalaska's avatar
      [SPARK-1478].3: Upgrade FlumeInputDStream's FlumeReceiver to support FLUME-1915 · 40a8fef4
      tmalaska authored
      This is a modified version of this PR https://github.com/apache/spark/pull/1168 done by @tmalaska
      Adds MIMA binary check exclusions.
      
      Author: tmalaska <ted.malaska@cloudera.com>
      Author: Tathagata Das <tathagata.das1565@gmail.com>
      
      Closes #1347 from tdas/FLUME-1915 and squashes the following commits:
      
      96065df [Tathagata Das] Added Mima exclusion for FlumeReceiver.
      41d5338 [tmalaska] Address line 57 that was too long
      12617e5 [tmalaska] SPARK-1478: Upgrade FlumeInputDStream's Flume...
      40a8fef4
    • Nicholas Chammas's avatar
      name ec2 instances and security groups consistently · 369aa84e
      Nicholas Chammas authored
      Security groups created by `spark-ec2` do not prepend “spark-“ to the
      name.
      
      Since naming the instances themselves is new to `spark-ec2`, it’s better
      to change that pattern to match the existing naming pattern for the
      security groups, rather than the other way around.
      
      Author: Nicholas Chammas <nicholas.chammas@gmail.com>
      Author: nchammas <nicholas.chammas@gmail.com>
      
      Closes #1344 from nchammas/master and squashes the following commits:
      
      f7e4581 [Nicholas Chammas] unrelated pep8 fix
      a36eed0 [Nicholas Chammas] name ec2 instances and security groups consistently
      de7292a [nchammas] Merge pull request #4 from apache/master
      2e4fe00 [nchammas] Merge pull request #3 from apache/master
      89fde08 [nchammas] Merge pull request #2 from apache/master
      69f6e22 [Nicholas Chammas] PEP8 fixes
      2627247 [Nicholas Chammas] broke up lines before they hit 100 chars
      6544b7e [Nicholas Chammas] [SPARK-2065] give launched instances names
      69da6cf [nchammas] Merge pull request #1 from apache/master
      369aa84e
    • Patrick Wendell's avatar
      HOTFIX: Minor doc update for sbt change · 88006a62
      Patrick Wendell authored
      88006a62
    • Prashant Sharma's avatar
      [SPARK-1776] Have Spark's SBT build read dependencies from Maven. · 628932b8
      Prashant Sharma authored
      Patch introduces the new way of working also retaining the existing ways of doing things.
      
      For example build instruction for yarn in maven is
      `mvn -Pyarn -PHadoop2.2 clean package -DskipTests`
      in sbt it can become
      `MAVEN_PROFILES="yarn, hadoop-2.2" sbt/sbt clean assembly`
      Also supports
      `sbt/sbt -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 clean assembly`
      
      Author: Prashant Sharma <prashant.s@imaginea.com>
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #772 from ScrapCodes/sbt-maven and squashes the following commits:
      
      a8ac951 [Prashant Sharma] Updated sbt version.
      62b09bb [Prashant Sharma] Improvements.
      fa6221d [Prashant Sharma] Excluding sql from mima
      4b8875e [Prashant Sharma] Sbt assembly no longer builds tools by default.
      72651ca [Prashant Sharma] Addresses code reivew comments.
      acab73d [Prashant Sharma] Revert "Small fix to run-examples script."
      ac4312c [Prashant Sharma] Revert "minor fix"
      6af91ac [Prashant Sharma] Ported oldDeps back. + fixes issues with prev commit.
      65cf06c [Prashant Sharma] Servelet API jars mess up with the other servlet jars on the class path.
      446768e [Prashant Sharma] minor fix
      89b9777 [Prashant Sharma] Merge conflicts
      d0a02f2 [Prashant Sharma] Bumped up pom versions, Since the build now depends on pom it is better updated there. + general cleanups.
      dccc8ac [Prashant Sharma] updated mima to check against 1.0
      a49c61b [Prashant Sharma] Fix for tools jar
      a2f5ae1 [Prashant Sharma] Fixes a bug in dependencies.
      cf88758 [Prashant Sharma] cleanup
      9439ea3 [Prashant Sharma] Small fix to run-examples script.
      96cea1f [Prashant Sharma] SPARK-1776 Have Spark's SBT build read dependencies from Maven.
      36efa62 [Patrick Wendell] Set project name in pom files and added eclipse/intellij plugins.
      4973dbd [Patrick Wendell] Example build using pom reader.
      628932b8
    • Masayoshi TSUZUKI's avatar
      SPARK-2115: Stage kill link is too close to stage details link · c2babc08
      Masayoshi TSUZUKI authored
      Moved (kill) link to the right side. Add confirmation dialog when (kill) link is clicked.
      
      Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>
      
      Closes #1350 from tsudukim/feature/SPARK-2115 and squashes the following commits:
      
      e2263b0 [Masayoshi TSUZUKI] Moved (kill) link to the right side. Add confirmation dialog when (kill) link is clicked.
      c2babc08
    • Raymond Liu's avatar
      Clean up SparkKMeans example's code · 2b18ea98
      Raymond Liu authored
      remove unused code
      
      Author: Raymond Liu <raymond.liu@intel.com>
      
      Closes #1352 from colorant/kmeans and squashes the following commits:
      
      ddcd1dd [Raymond Liu] Clean up SparkKMeans example's code
      2b18ea98
  3. Jul 09, 2014
    • Patrick Wendell's avatar
      HOTFIX: Remove persistently failing test in master. · 553c578d
      Patrick Wendell authored
      Apparently this functionality is going to be removed soon anywyas.
      553c578d
    • Patrick Wendell's avatar
      Revert "[HOTFIX] Synchronize on SQLContext.settings in tests." · dd22bc2d
      Patrick Wendell authored
      This reverts commit d4c30cd9.
      dd22bc2d
    • Patrick Wendell's avatar
      SPARK-2416: Allow richer reporting of unit test results · 2e0a037d
      Patrick Wendell authored
      The built-in Jenkins integration is pretty bad. It's very confusing to users whether tests have passed or failed and we can't easily customize the message.
      
      With some small scripting around the Github API we can do much better than this.
      
      Author: Patrick Wendell <pwendell@gmail.com>
      
      Closes #1340 from pwendell/better-qa-messages and squashes the following commits:
      
      fd6077d [Patrick Wendell] Better automation for unit tests.
      2e0a037d
    • Li Pu's avatar
      SPARK-1782: svd for sparse matrix using ARPACK · 1f33e1f2
      Li Pu authored
      copy ARPACK dsaupd/dseupd code from latest breeze
      change RowMatrix to use sparse SVD
      change tests for sparse SVD
      
      All tests passed. I will run it against some large matrices.
      
      Author: Li Pu <lpu@twitter.com>
      Author: Xiangrui Meng <meng@databricks.com>
      Author: Li Pu <li.pu@outlook.com>
      
      Closes #964 from vrilleup/master and squashes the following commits:
      
      7312ec1 [Li Pu] very minor comment fix
      4c618e9 [Li Pu] Merge pull request #1 from mengxr/vrilleup-master
      a461082 [Xiangrui Meng] make superscript show up correctly in doc
      861ec48 [Xiangrui Meng] simplify axpy
      62969fa [Xiangrui Meng] use BDV directly in symmetricEigs change the computation mode to local-svd, local-eigs, and dist-eigs update tests and docs
      c273771 [Li Pu] automatically determine SVD compute mode and parameters
      7148426 [Li Pu] improve RowMatrix multiply
      5543cce [Li Pu] improve svd api
      819824b [Li Pu] add flag for dense svd or sparse svd
      eb15100 [Li Pu] fix binary compatibility
      4c7aec3 [Li Pu] improve comments
      e7850ed [Li Pu] use aggregate and axpy
      827411b [Li Pu] fix EOF new line
      9c80515 [Li Pu] use non-sparse implementation when k = n
      fe983b0 [Li Pu] improve scala style
      96d2ecb [Li Pu] improve eigenvalue sorting
      e1db950 [Li Pu] SPARK-1782: svd for sparse matrix using ARPACK
      1f33e1f2
    • johnnywalleye's avatar
      [SPARK-2417][MLlib] Fix DecisionTree tests · d35e3db2
      johnnywalleye authored
      Fixes test failures introduced by https://github.com/apache/spark/pull/1316.
      
      For both the regression and classification cases,
      val stats is the InformationGainStats for the best tree split.
      stats.predict is the predicted value for the data, before the split is made.
      Since 600 of the 1,000 values generated by DecisionTreeSuite.generateCategoricalDataPoints() are 1.0 and the rest 0.0, the regression tree and classification tree both correctly predict a value of 0.6 for this data now, and the assertions have been changed to reflect that.
      
      Author: johnnywalleye <jsondag@gmail.com>
      
      Closes #1343 from johnnywalleye/decision-tree-tests and squashes the following commits:
      
      ef80603 [johnnywalleye] [SPARK-2417][MLlib] Fix DecisionTree tests
      d35e3db2
    • Manuel Laflamme's avatar
      [STREAMING] SPARK-2343: Fix QueueInputDStream with oneAtATime false · 0eb11527
      Manuel Laflamme authored
      Fix QueueInputDStream which was not removing dequeued items when used with the oneAtATime flag disabled.
      
      Author: Manuel Laflamme <manuel.laflamme@gmail.com>
      
      Closes #1285 from mlaflamm/spark-2343 and squashes the following commits:
      
      61c9e38 [Manuel Laflamme] Unit tests for queue input stream
      c51d029 [Manuel Laflamme] Fix QueueInputDStream with oneAtATime false
      0eb11527
    • Kay Ousterhout's avatar
      [SPARK-2384] Add tooltips to UI. · 339441f5
      Kay Ousterhout authored
      This patch adds tooltips to clarify some points of confusion in the UI.  When users mouse over some of the table headers (shuffle read, write, and input size) as well as over the "scheduler delay" metric shown for each stage, a black tool tip (see image below) pops up describing the metric in more detail.  After the tooltip mechanism is added by this commit, I imagine others may want to add more tooltips for other things in the UI, but I think this is a good starting point.
      
      ![tooltip](https://cloud.githubusercontent.com/assets/1108612/3491905/994e179e-059f-11e4-92f2-c6c12d248d81.jpg)
      
      This looks scary-big but much of it is adding the bootstrap tool tip JavaScript.
      
      Also I have no idea what to put for the license in tooltip (I left it the same -- the Twitter apache header) or for JQuery (left it as nothing) -- @mateiz what's the right thing here?
      
      cc @pwendell @andrewor14 @rxin
      
      Author: Kay Ousterhout <kayousterhout@gmail.com>
      
      Closes #1314 from kayousterhout/tooltips and squashes the following commits:
      
      19981b5 [Kay Ousterhout] Exclude non-licensed javascript files from style check
      d9ab5a9 [Kay Ousterhout] Response to Andrew's review
      7752449 [Kay Ousterhout] [SPARK-2384] Add tooltips to UI.
      339441f5
  4. Jul 08, 2014
    • johnnywalleye's avatar
      [SPARK-2152][MLlib] fix bin offset in DecisionTree node aggregations (also resolves SPARK-2160) · 1114207c
      johnnywalleye authored
      Hi, this pull fixes (what I believe to be) a bug in DecisionTree.scala.
      
      In the extractLeftRightNodeAggregates function, the first set of rightNodeAgg values for Regression are set in line 792 as follows:
      
      rightNodeAgg(featureIndex)(2 * (numBins - 2))
        = binData(shift + (2 * numBins - 1)))
      
      Then there is a loop that sets the rest of the values, as in line 809:
      
      rightNodeAgg(featureIndex)(2 * (numBins - 2 - splitIndex)) =
        binData(shift + (2 *(numBins - 2 - splitIndex))) +
        rightNodeAgg(featureIndex)(2 * (numBins - 1 - splitIndex))
      
      But since splitIndex starts at 1, this ends up skipping a set of binData values.
      
      The changes here address this issue, for both the Regression and Classification cases.
      
      Author: johnnywalleye <jsondag@gmail.com>
      
      Closes #1316 from johnnywalleye/master and squashes the following commits:
      
      73809da [johnnywalleye] fix bin offset in DecisionTree node aggregations
      1114207c
    • DB Tsai's avatar
      [SPARK-2413] Upgrade junit_xml_listener to 0.5.1 · ac9cdc11
      DB Tsai authored
      which fixes the following issues
      
      1) fix the class name to be fully qualified classpath
      2) make sure the the reporting time is in second not in miliseond, which causing JUnit HTML to report incorrect number
      3) make sure the duration of the tests are accumulative.
      
      Author: DB Tsai <dbtsai@alpinenow.com>
      
      Closes #1333 from dbtsai/dbtsai-junit and squashes the following commits:
      
      bbeac4b [DB Tsai] Upgrade junit_xml_listener to 0.5.1 which fixes the following issues
      ac9cdc11
    • Andrew Or's avatar
      [SPARK-2392] Executors should not start their own HTTP servers · bf04a390
      Andrew Or authored
      Executors currently start their own unused HTTP file servers. This is because we use the same SparkEnv class for both executors and drivers, and we do not distinguish this case.
      
      In the longer term, we should separate out SparkEnv for the driver and SparkEnv for the executors.
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #1335 from andrewor14/executor-http-server and squashes the following commits:
      
      46ef263 [Andrew Or] Start HTTP server only on the driver
      bf04a390
    • Gabriele Nizzoli's avatar
      [SPARK-2362] Fix for newFilesOnly logic in file DStream · e6f7bfcf
      Gabriele Nizzoli authored
      The newFilesOnly logic should be inverted: the logic should be that if the flag newFilesOnly==true then only start reading files older than current time. As the code is now if newFilesOnly==true then it will start to read files that are older than 0L (that is: every file in the directory).
      
      Author: Gabriele Nizzoli <mail@nizzoli.net>
      
      Closes #1077 from gabrielenizzoli/master and squashes the following commits:
      
      4f1d261 [Gabriele Nizzoli] Fix for newFilesOnly logic in file DStream
      e6f7bfcf
    • Reynold Xin's avatar
      [SPARK-2409] Make SQLConf thread safe. · 32516f86
      Reynold Xin authored
      Author: Reynold Xin <rxin@apache.org>
      
      Closes #1334 from rxin/sqlConfThreadSafetuy and squashes the following commits:
      
      c1e0a5a [Reynold Xin] Fixed the duplicate comment.
      7614372 [Reynold Xin] [SPARK-2409] Make SQLConf thread safe.
      32516f86
    • CrazyJvm's avatar
      SPARK-2400 : fix spark.yarn.max.executor.failures explaination · b520b645
      CrazyJvm authored
      According to
      ```scala
        private val maxNumExecutorFailures = sparkConf.getInt("spark.yarn.max.executor.failures",
          sparkConf.getInt("spark.yarn.max.worker.failures", math.max(args.numExecutors * 2, 3)))
      ```
      default value should be numExecutors * 2, with minimum of 3,  and it's same to the config
      `spark.yarn.max.worker.failures`
      
      Author: CrazyJvm <crazyjvm@gmail.com>
      
      Closes #1282 from CrazyJvm/yarn-doc and squashes the following commits:
      
      1a5f25b [CrazyJvm] remove deprecated config
      c438aec [CrazyJvm] fix style
      86effa6 [CrazyJvm] change expression
      211f130 [CrazyJvm] fix html tag
      2900d23 [CrazyJvm] fix style
      a4b2e27 [CrazyJvm] fix configuration spark.yarn.max.executor.failures
      b520b645
    • Daniel Darabos's avatar
      [SPARK-2403] Catch all errors during serialization in DAGScheduler · c8a2313c
      Daniel Darabos authored
      https://issues.apache.org/jira/browse/SPARK-2403
      
      Spark hangs for us whenever we forget to register a class with Kryo. This should be a simple fix for that. But let me know if you have a better suggestion.
      
      I did not write a new test for this. It would be pretty complicated and I'm not sure it's worthwhile for such a simple change. Let me know if you disagree.
      
      Author: Daniel Darabos <darabos.daniel@gmail.com>
      
      Closes #1329 from darabos/spark-2403 and squashes the following commits:
      
      3aceaad [Daniel Darabos] Print full stack trace for miscellaneous exceptions during serialization.
      52c22ba [Daniel Darabos] Only catch NonFatal exceptions.
      361e962 [Daniel Darabos] Catch all errors during serialization in DAGScheduler.
      c8a2313c
    • Michael Armbrust's avatar
      [SPARK-2395][SQL] Optimize common LIKE patterns. · cc3e0a14
      Michael Armbrust authored
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #1325 from marmbrus/slowLike and squashes the following commits:
      
      023c3eb [Michael Armbrust] add comment.
      8b421c2 [Michael Armbrust] Handle the case where the final % is actually escaped.
      d34d37e [Michael Armbrust] add periods.
      3bbf35f [Michael Armbrust] Roll back changes to SparkBuild
      53894b1 [Michael Armbrust] Fix grammar.
      4094462 [Michael Armbrust] Fix grammar.
      6d3d0a0 [Michael Armbrust] Optimize common LIKE patterns.
      cc3e0a14
    • Andrew Or's avatar
      [EC2] Add default history server port to ec2 script · 56e009d4
      Andrew Or authored
      Right now I have to open it manually
      
      Author: Andrew Or <andrewor14@gmail.com>
      
      Closes #1296 from andrewor14/hist-serv-port and squashes the following commits:
      
      8895a1f [Andrew Or] Add default history server port to ec2 script
      56e009d4
    • Michael Armbrust's avatar
      [SPARK-2391][SQL] Custom take() for LIMIT queries. · 5a406364
      Michael Armbrust authored
      Using Spark's take can result in an entire in-memory partition to be shipped in order to retrieve a single row.
      
      Author: Michael Armbrust <michael@databricks.com>
      
      Closes #1318 from marmbrus/takeLimit and squashes the following commits:
      
      77289a5 [Michael Armbrust] Update scala doc
      32f0674 [Michael Armbrust] Custom take implementation for LIMIT queries.
      5a406364
    • witgo's avatar
      Resolve sbt warnings during build Ⅱ · 3cd5029b
      witgo authored
      Author: witgo <witgo@qq.com>
      
      Closes #1153 from witgo/expectResult and squashes the following commits:
      
      97541d8 [witgo] merge master
      ead26e7 [witgo] Resolve sbt warnings during build
      3cd5029b
    • Rishi Verma's avatar
      Updated programming-guide.md · 0128905e
      Rishi Verma authored
      Made sure that readers know the random number generator seed argument, within the 'takeSample' method, is optional.
      
      Author: Rishi Verma <riverma@apache.org>
      
      Closes #1324 from riverma/patch-1 and squashes the following commits:
      
      4699676 [Rishi Verma] Updated programming-guide.md
      0128905e
  5. Jul 07, 2014
    • Yanjie Gao's avatar
      [SPARK-2235][SQL]Spark SQL basicOperator add Intersect operator · 50561f43
      Yanjie Gao authored
      Hi all,
      I want to submit a basic operator Intersect
      For example , in sql case
      select * from table1
      intersect
      select * from table2
      So ,i want use this operator support this function in Spark SQL
      This operator will return the  the intersection of SparkPlan child table RDD .
      JIRA:https://issues.apache.org/jira/browse/SPARK-2235
      
      Author: Yanjie Gao <gaoyanjie55@163.com>
      Author: YanjieGao <396154235@qq.com>
      
      Closes #1150 from YanjieGao/patch-5 and squashes the following commits:
      
      4629afe [YanjieGao] reformat the code
      bdc2ac0 [YanjieGao] reformat the code as Michael's suggestion
      3b29ad6 [YanjieGao] Merge remote branch 'upstream/master' into patch-5
      1cfbfe6 [YanjieGao] refomat some files
      ea78f33 [YanjieGao] resolve conflict and add annotation on basicOperator and remove HiveQl
      0c7cca5 [YanjieGao] modify format problem
      a802ca8 [YanjieGao] Merge remote branch 'upstream/master' into patch-5
      5e374c7 [YanjieGao] resolve conflict in SparkStrategies and basicOperator
      f7961f6 [Yanjie Gao] update the line less than
      bdc4a05 [Yanjie Gao] Update basicOperators.scala
      0b49837 [Yanjie Gao] delete the annotation
      f1288b4 [Yanjie Gao] delete annotation
      e2b64be [Yanjie Gao] Update basicOperators.scala
      4dd453e [Yanjie Gao] Update SQLQuerySuite.scala
      790765d [Yanjie Gao] Update SparkStrategies.scala
      ac73e60 [Yanjie Gao] Update basicOperators.scala
      d4ac5e5 [Yanjie Gao] Update HiveQl.scala
      61e88e7 [Yanjie Gao] Update SqlParser.scala
      469f099 [Yanjie Gao] Update basicOperators.scala
      e5bff61 [Yanjie Gao] Spark SQL basicOperator add Intersect operator
      50561f43
    • Yin Huai's avatar
      [SPARK-2376][SQL] Selecting list values inside nested JSON objects raises... · 4352a2fd
      Yin Huai authored
      [SPARK-2376][SQL] Selecting list values inside nested JSON objects raises java.lang.IllegalArgumentException
      
      JIRA: https://issues.apache.org/jira/browse/SPARK-2376
      
      Author: Yin Huai <huai@cse.ohio-state.edu>
      
      Closes #1320 from yhuai/SPARK-2376 and squashes the following commits:
      
      0107417 [Yin Huai] Merge remote-tracking branch 'upstream/master' into SPARK-2376
      480803d [Yin Huai] Correctly handling JSON arrays in PySpark.
      4352a2fd
    • Yin Huai's avatar
      [SPARK-2375][SQL] JSON schema inference may not resolve type conflicts... · f0496ee1
      Yin Huai authored
      [SPARK-2375][SQL] JSON schema inference may not resolve type conflicts correctly for a field inside an array of structs
      
      For example, for
      ```
      {"array": [{"field":214748364700}, {"field":1}]}
      ```
      the type of field is resolved as IntType. While, for
      ```
      {"array": [{"field":1}, {"field":214748364700}]}
      ```
      the type of field is resolved as LongType.
      
      JIRA: https://issues.apache.org/jira/browse/SPARK-2375
      
      Author: Yin Huai <huaiyin.thu@gmail.com>
      
      Closes #1308 from yhuai/SPARK-2375 and squashes the following commits:
      
      3e2e312 [Yin Huai] Update unit test.
      1b2ff9f [Yin Huai] Merge remote-tracking branch 'upstream/master' into SPARK-2375
      10794eb [Yin Huai] Correctly resolve the type of a field inside an array of structs.
      f0496ee1
    • Takuya UESHIN's avatar
      [SPARK-2386] [SQL] RowWriteSupport should use the exact types to cast. · 4deeed17
      Takuya UESHIN authored
      When execute `saveAsParquetFile` with non-primitive type, `RowWriteSupport` uses wrong type `Int` for `ByteType` and `ShortType`.
      
      Author: Takuya UESHIN <ueshin@happy-camper.st>
      
      Closes #1315 from ueshin/issues/SPARK-2386 and squashes the following commits:
      
      20d89ec [Takuya UESHIN] Use None instead of null.
      bd88741 [Takuya UESHIN] Add a test.
      323d1d2 [Takuya UESHIN] Modify RowWriteSupport to use the exact types to cast.
      4deeed17
    • Yin Huai's avatar
      [SPARK-2339][SQL] SQL parser in sql-core is case sensitive, but a table alias... · c0b4cf09
      Yin Huai authored
      [SPARK-2339][SQL] SQL parser in sql-core is case sensitive, but a table alias is converted to lower case when we create Subquery
      
      Reported by http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Join-throws-exception-td8599.html
      After we get the table from the catalog, because the table has an alias, we will temporarily insert a Subquery. Then, we convert the table alias to lower case no matter if the parser is case sensitive or not.
      To see the issue ...
      ```
      val sqlContext = new org.apache.spark.sql.SQLContext(sc)
      import sqlContext.createSchemaRDD
      
      case class Person(name: String, age: Int)
      
      val people = sc.textFile("examples/src/main/resources/people.txt").map(_.split(",")).map(p => Person(p(0), p(1).trim.toInt))
      people.registerAsTable("people")
      
      sqlContext.sql("select PEOPLE.name from people PEOPLE")
      ```
      The plan is ...
      ```
      == Query Plan ==
      Project ['PEOPLE.name]
       ExistingRdd [name#0,age#1], MapPartitionsRDD[4] at mapPartitions at basicOperators.scala:176
      ```
      You can find that `PEOPLE.name` is not resolved.
      
      This PR introduces three changes.
      1.  If a table has an alias, the catalog will not lowercase the alias. If a lowercase alias is needed, the analyzer will do the work.
      2.  A catalog has a new val caseSensitive that indicates if this catalog is case sensitive or not. For example, a SimpleCatalog is case sensitive, but
      3.  Corresponding unit tests.
      With this PR, case sensitivity of database names and table names is handled by the catalog. Case sensitivity of other identifiers are handled by the analyzer.
      
      JIRA: https://issues.apache.org/jira/browse/SPARK-2339
      
      Author: Yin Huai <huai@cse.ohio-state.edu>
      
      Closes #1317 from yhuai/SPARK-2339 and squashes the following commits:
      
      12d8006 [Yin Huai] Handling case sensitivity correctly. This patch introduces three changes. 1. If a table has an alias, the catalog will not lowercase the alias. If a lowercase alias is needed, the analyzer will do the work. 2. A catalog has a new val caseSensitive that indicates if this catalog is case sensitive or not. For example, a SimpleCatalog is case sensitive, but 3. Corresponding unit tests. With this patch, case sensitivity of database names and table names is handled by the catalog. Case sensitivity of other identifiers is handled by the analyzer.
      c0b4cf09
    • Neville Li's avatar
      [SPARK-1977][MLLIB] register mutable BitSet in MovieLenseALS · f7ce1b3b
      Neville Li authored
      Author: Neville Li <neville@spotify.com>
      
      Closes #1319 from nevillelyh/gh/SPARK-1977 and squashes the following commits:
      
      1f0a355 [Neville Li] [SPARK-1977][MLLIB] register mutable BitSet in MovieLenseALS
      f7ce1b3b
  6. Jul 05, 2014
    • Takuya UESHIN's avatar
      [SPARK-2327] [SQL] Fix nullabilities of Join/Generate/Aggregate. · 9d5ecf82
      Takuya UESHIN authored
      Fix nullabilities of `Join`/`Generate`/`Aggregate` because:
      - Output attributes of opposite side of `OuterJoin` should be nullable.
      - Output attributes of generater side of `Generate` should be nullable if `join` is `true` and `outer` is `true`.
      - `AttributeReference` of `computedAggregates` of `Aggregate` should be the same as `aggregateExpression`'s.
      
      Author: Takuya UESHIN <ueshin@happy-camper.st>
      
      Closes #1266 from ueshin/issues/SPARK-2327 and squashes the following commits:
      
      3ace83a [Takuya UESHIN] Add withNullability to Attribute and use it to change nullabilities.
      df1ae53 [Takuya UESHIN] Modify nullabilize to leave attribute if not resolved.
      799ce56 [Takuya UESHIN] Add nullabilization to Generate of SparkPlan.
      a0fc9bc [Takuya UESHIN] Fix scalastyle errors.
      0e31e37 [Takuya UESHIN] Fix Aggregate resultAttribute nullabilities.
      09532ec [Takuya UESHIN] Fix Generate output nullabilities.
      f20f196 [Takuya UESHIN] Fix Join output nullabilities.
      9d5ecf82
Loading