Skip to content
Snippets Groups Projects
  1. Jul 07, 2015
    • MechCoder's avatar
      [SPARK-8704] [ML] [PySpark] Add missing methods in StandardScaler · 35d781e7
      MechCoder authored
      Add std, mean to StandardScalerModel
      getVectors, findSynonyms to Word2Vec Model
      setFeatures and getFeatures to hashingTF
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #7086 from MechCoder/missing_model_methods and squashes the following commits:
      
      9fbae90 [MechCoder] Add type
      6e3d6b2 [MechCoder] [SPARK-8704] Add missing methods in StandardScaler (ML and PySpark)
      35d781e7
    • Feynman Liang's avatar
      [SPARK-8559] [MLLIB] Support Association Rule Generation · 3336c7b1
      Feynman Liang authored
      Distributed generation of single-consequent association rules from a RDD of frequent itemsets. Tests referenced against `R`'s implementation of A Priori in [arules](http://cran.r-project.org/web/packages/arules/index.html).
      
      Author: Feynman Liang <fliang@databricks.com>
      
      Closes #7005 from feynmanliang/fp-association-rules-distributed and squashes the following commits:
      
      466ced0 [Feynman Liang] Refactor AR generation impl
      73c1cff [Feynman Liang] Make rule attributes public, remove numTransactions from FreqItemset
      80f63ff [Feynman Liang] Change default confidence and optimize imports
      04cf5b5 [Feynman Liang] Code review with @mengxr, add R to tests
      0cc1a6a [Feynman Liang] Java compatibility test
      f3c14b5 [Feynman Liang] Fix MiMa test
      764375e [Feynman Liang] Fix tests
      1187307 [Feynman Liang] Almost working tests
      b20779b [Feynman Liang] Working implementation
      5395c4e [Feynman Liang] Fix imports
      2d34405 [Feynman Liang] Partial implementation of distributed ar
      83ace4b [Feynman Liang] Local rule generation without pruning complete
      69c2c87 [Feynman Liang] Working local implementation, now to parallelize../..
      4e1ec9a [Feynman Liang] Pull FreqItemsets out, refactor type param, tests
      69ccedc [Feynman Liang] First implementation of association rule generation
      3336c7b1
    • Simon Hafner's avatar
      [SPARK-8821] [EC2] Switched to binary mode for file reading · 70beb808
      Simon Hafner authored
      
      Otherwise the script will crash with
      
          - Downloading boto...
          Traceback (most recent call last):
            File "ec2/spark_ec2.py", line 148, in <module>
              setup_external_libs(external_libs)
            File "ec2/spark_ec2.py", line 128, in setup_external_libs
              if hashlib.md5(tar.read()).hexdigest() != lib["md5"]:
            File "/usr/lib/python3.4/codecs.py", line 319, in decode
              (result, consumed) = self._buffer_decode(data, self.errors, final)
          UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
      
      In case of an utf8 env setting.
      
      Author: Simon Hafner <hafnersimon@gmail.com>
      
      Closes #7215 from reactormonk/branch-1.4 and squashes the following commits:
      
      e86957a [Simon Hafner] [SPARK-8821] [EC2] Switched to binary mode
      
      (cherry picked from commit 83a621a5)
      Signed-off-by: default avatarShivaram Venkataraman <shivaram@cs.berkeley.edu>
      70beb808
    • MechCoder's avatar
      [SPARK-8823] [MLLIB] [PYSPARK] Optimizations for SparseVector dot products · 738c1074
      MechCoder authored
      Follow up for https://github.com/apache/spark/pull/5946
      
      Currently we iterate over indices and values in SparseVector and can be vectorized.
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #7222 from MechCoder/sparse_optim and squashes the following commits:
      
      dcb51d3 [MechCoder] [SPARK-8823] [MLlib] [PySpark] Optimizations for SparseVector dot product
      738c1074
    • MechCoder's avatar
      [SPARK-8711] [ML] Add additional methods to PySpark ML tree models · 1dbc4a15
      MechCoder authored
      Add numNodes and depth to treeModels, add treeWeights to ensemble Models.
      Add __repr__ to all models.
      
      Author: MechCoder <manojkumarsivaraj334@gmail.com>
      
      Closes #7095 from MechCoder/missing_methods_tree and squashes the following commits:
      
      23b08be [MechCoder] private [spark]
      38a0860 [MechCoder] rename pyTreeWeights to javaTreeWeights
      6d16ad8 [MechCoder] Fix Python 3 Error
      47d7023 [MechCoder] Use np.allclose and treeEnsembleModel -> TreeEnsembleMethods
      819098c [MechCoder] [SPARK-8711] [ML] Add additional methods ot PySpark ML tree models
      1dbc4a15
    • Mike Dusenberry's avatar
      [SPARK-8570] [MLLIB] [DOCS] Improve MLlib Local Matrix Documentation. · 0a63d7ab
      Mike Dusenberry authored
      Updated MLlib Data Types Local Matrix section to include information on sparse matrices, added sparse matrix examples to the Scala and Java examples, and added Python examples for both dense and sparse matrices.
      
      Author: Mike Dusenberry <mwdusenb@us.ibm.com>
      
      Closes #6958 from dusenberrymw/Improve_MLlib_Local_Matrix_Documentation and squashes the following commits:
      
      ceae407 [Mike Dusenberry] Updated MLlib Data Types Local Matrix section to include information on sparse matrices, added sparse matrix examples to the Scala and Java examples, and added Python examples for both dense and sparse matrices.
      0a63d7ab
    • Yanbo Liang's avatar
      [SPARK-8788] [ML] Add Java unit test for PCA transformer · d73bc08d
      Yanbo Liang authored
      Add Java unit test for PCA transformer
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #7184 from yanboliang/spark-8788 and squashes the following commits:
      
      9d1a2af [Yanbo Liang] address comments
      b34451f [Yanbo Liang] Add Java unit test for PCA transformer
      d73bc08d
    • Sean Owen's avatar
      [SPARK-6731] [CORE] Addendum: Upgrade Apache commons-math3 to 3.4.1 · dcbd85b7
      Sean Owen authored
      (This finishes the job by removing the version overridden by Hadoop profiles.)
      
      See discussion at https://github.com/apache/spark/pull/6994#issuecomment-119113167
      
      Author: Sean Owen <sowen@cloudera.com>
      
      Closes #7261 from srowen/SPARK-6731.2 and squashes the following commits:
      
      5a3f59e [Sean Owen] Finish updating Commons Math3 to 3.4.1 from 3.1.1
      dcbd85b7
    • Patrick Wendell's avatar
      [HOTFIX] Rename release-profile to release · 1cb2629f
      Patrick Wendell authored
      when publishing releases. We named it as 'release-profile' because that is
      the Maven convention. However, it turns out this special name causes several
      other things to kick-in when we are creating releases that are not desirable.
      For instance, it triggers the javadoc plugin to run, which actually fails
      in our current build set-up.
      
      The fix is just to rename this to a different profile to have no
      collateral damage associated with its use.
      1cb2629f
    • Wenchen Fan's avatar
      [SPARK-8759][SQL] add default eval to binary and unary expression according to... · c46aaf47
      Wenchen Fan authored
      [SPARK-8759][SQL] add default eval to binary and unary expression according to default behavior of nullable
      
      We have `nullSafeCodeGen` to provide default code generation for binary and unary expression, and we can do the same thing for `eval`.
      
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #7157 from cloud-fan/refactor and squashes the following commits:
      
      f3987c6 [Wenchen Fan] refactor Expression
      c46aaf47
  2. Jul 06, 2015
    • Alok  Singh's avatar
      [SPARK-5562] [MLLIB] LDA should handle empty document. · 6718c1eb
      Alok Singh authored
      See the jira https://issues.apache.org/jira/browse/SPARK-5562
      
      Author: Alok  Singh <singhal@Aloks-MacBook-Pro.local>
      Author: Alok  Singh <singhal@aloks-mbp.usca.ibm.com>
      Author: Alok Singh <“singhal@us.ibm.com”>
      
      Closes #7064 from aloknsingh/aloknsingh_SPARK-5562 and squashes the following commits:
      
      259a0a7 [Alok Singh] change as per the comments by @jkbradley
      be48491 [Alok  Singh] [SPARK-5562][MLlib] re-order import in alphabhetical order
      c01311b [Alok  Singh] [SPARK-5562][MLlib] fix the newline typo
      b271c8a [Alok  Singh] [SPARK-5562][Mllib] As per github discussion with jkbradley. We would like to simply things.
      7c06251 [Alok  Singh] [SPARK-5562][MLlib] modified the JavaLDASuite for test passing
      c710cb6 [Alok  Singh] fix the scala code style to have space after :
      2572a08 [Alok  Singh] [SPARK-5562][MLlib] change the import xyz._ to the import xyz.{c1, c2} ..
      ab55fbf [Alok  Singh] [SPARK-5562][MLlib] Change as per Sean Owen's comments https://github.com/apache/spark/pull/7064/files#diff-9236d23975e6f5a5608ffc81dfd79146
      9f4f9ea [Alok  Singh] [SPARK-5562][MLlib] LDA should handle empty document.
      6718c1eb
    • Takeshi YAMAMURO's avatar
      [SPARK-6747] [SQL] Throw an AnalysisException when unsupported Java list types used in Hive UDF · 1821fc16
      Takeshi YAMAMURO authored
      The current implementation can't handle List<> as a return type in Hive UDF and
      throws meaningless Match Error.
      We assume an UDF below;
      public class UDFToListString extends UDF {
      public List<String> evaluate(Object o)
      { return Arrays.asList("xxx", "yyy", "zzz"); }
      }
      An exception of scala.MatchError is thrown as follows when the UDF used;
      scala.MatchError: interface java.util.List (of class java.lang.Class)
      at org.apache.spark.sql.hive.HiveInspectors$class.javaClassToDataType(HiveInspectors.scala:174)
      at org.apache.spark.sql.hive.HiveSimpleUdf.javaClassToDataType(hiveUdfs.scala:76)
      at org.apache.spark.sql.hive.HiveSimpleUdf.dataType$lzycompute(hiveUdfs.scala:106)
      at org.apache.spark.sql.hive.HiveSimpleUdf.dataType(hiveUdfs.scala:106)
      at org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:131)
      at org.apache.spark.sql.catalyst.planning.PhysicalOperation$$anonfun$collectAliases$1.applyOrElse(patterns.scala:95)
      at org.apache.spark.sql.catalyst.planning.PhysicalOperation$$anonfun$collectAliases$1.applyOrElse(patterns.scala:94)
      at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
      at scala.collection.TraversableLike$$anonfun$collect$1.apply(TraversableLike.scala:278)
      ...
      To make udf developers more understood, we need to throw a more suitable exception.
      
      Author: Takeshi YAMAMURO <linguin.m.s@gmail.com>
      
      Closes #7248 from maropu/FixBugInHiveInspectors and squashes the following commits:
      
      1c3df2a [Takeshi YAMAMURO] Fix comments
      56305de [Takeshi YAMAMURO] Fix conflicts
      92ed7a6 [Takeshi YAMAMURO] Throw an exception when java list type used
      2844a8e [Takeshi YAMAMURO] Apply comments
      7114a47 [Takeshi YAMAMURO] Add TODO comments in UDFToListString of HiveUdfSuite
      fdb2ae4 [Takeshi YAMAMURO] Add StringToUtf8 to comvert String into UTF8String
      af61f2e [Takeshi YAMAMURO] Remove a new type
      7f812fd [Takeshi YAMAMURO] Fix code-style errors
      6984bf4 [Takeshi YAMAMURO] Apply review comments
      93e3d4e [Takeshi YAMAMURO] Add a blank line at the end of UDFToListString
      ee232db [Takeshi YAMAMURO] Support List as a return type in Hive UDF
      1e82316 [Takeshi YAMAMURO] Apply comments
      21e8763 [Takeshi YAMAMURO] Add TODO comments in UDFToListString of HiveUdfSuite
      a488712 [Takeshi YAMAMURO] Add StringToUtf8 to comvert String into UTF8String
      1c7b9d1 [Takeshi YAMAMURO] Remove a new type
      f965c34 [Takeshi YAMAMURO] Fix code-style errors
      9406416 [Takeshi YAMAMURO] Apply review comments
      e21ce7e [Takeshi YAMAMURO] Add a blank line at the end of UDFToListString
      e553f10 [Takeshi YAMAMURO] Support List as a return type in Hive UDF
      1821fc16
    • Andrew Or's avatar
      Revert "[SPARK-8781] Fix variables in published pom.xml are not resolved" · 929dfa24
      Andrew Or authored
      This reverts commit 82cf3315.
      
      Conflicts:
      	pom.xml
      929dfa24
    • Andrew Or's avatar
      [SPARK-8819] Fix build for maven 3.3.x · 9eae5fa6
      Andrew Or authored
      This is a workaround for MSHADE-148, which leads to an infinite loop when building Spark with maven 3.3.x. This was originally caused by #6441, which added a bunch of test dependencies on the spark-core test module. Recently, it was revealed by #7193.
      
      This patch adds a `-Prelease` profile. If present, it will set `createDependencyReducedPom` to true. The consequences are:
      - If you are releasing Spark with this profile, you are fine as long as you use maven 3.2.x or before.
      - If you are releasing Spark without this profile, you will run into SPARK-8781.
      - If you are not releasing Spark but you are using this profile, you may run into SPARK-8819.
      - If you are not releasing Spark and you did not include this profile, you are fine.
      
      This is all documented in `pom.xml` and tested locally with both versions of maven.
      
      Author: Andrew Or <andrew@databricks.com>
      
      Closes #7219 from andrewor14/fix-maven-build and squashes the following commits:
      
      1d37e87 [Andrew Or] Merge branch 'master' of github.com:apache/spark into fix-maven-build
      3574ae4 [Andrew Or] Review comments
      f39199c [Andrew Or] Create a -Prelease profile that flags `createDependencyReducedPom`
      9eae5fa6
    • Liang-Chi Hsieh's avatar
      [SPARK-8463][SQL] Use DriverRegistry to load jdbc driver at writing path · d4d6d31d
      Liang-Chi Hsieh authored
      JIRA: https://issues.apache.org/jira/browse/SPARK-8463
      
      Currently, at the reading path, `DriverRegistry` is used to load needed jdbc driver at executors. However, at the writing path, we also need `DriverRegistry` to load jdbc driver.
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #6900 from viirya/jdbc_write_driver and squashes the following commits:
      
      16cd04b [Liang-Chi Hsieh] Use DriverRegistry to load jdbc driver at writing path.
      d4d6d31d
    • animesh's avatar
      [SPARK-8072] [SQL] Better AnalysisException for writing DataFrame with identically named columns · 09a06418
      animesh authored
      Adding a function checkConstraints which will check for the constraints to be applied on the dataframe / dataframe schema. Function called before storing the dataframe to an external storage. Function added in the corresponding datasource API.
      cc rxin marmbrus
      
      Author: animesh <animesh@apache.spark>
      
      This patch had conflicts when merged, resolved by
      Committer: Michael Armbrust <michael@databricks.com>
      
      Closes #7013 from animeshbaranawal/8072 and squashes the following commits:
      
      f70dd0e [animesh] Change IO exception to Analysis Exception
      fd45e1b [animesh] 8072: Fix Style Issues
      a8a964f [animesh] 8072: Improving on previous commits
      3cc4d2c [animesh] Fix Style Issues
      1a89115 [animesh] Fix Style Issues
      98b4399 [animesh] 8072 : Moved the exception handling to ResolvedDataSource specific to parquet format
      7c3d928 [animesh] 8072: Adding check to DataFrameWriter.scala
      09a06418
    • Yin Huai's avatar
      [SPARK-8588] [SQL] Regression test · 7b467cc9
      Yin Huai authored
      This PR adds regression test for https://issues.apache.org/jira/browse/SPARK-8588 (fixed by https://github.com/apache/spark/commit/457d07eaa023b44b75344110508f629925eb6247).
      
      Author: Yin Huai <yhuai@databricks.com>
      
      This patch had conflicts when merged, resolved by
      Committer: Michael Armbrust <michael@databricks.com>
      
      Closes #7103 from yhuai/SPARK-8588-test and squashes the following commits:
      
      eb5f418 [Yin Huai] Add a query test.
      c61a173 [Yin Huai] Regression test for SPARK-8588.
      7b467cc9
    • Yanbo Liang's avatar
      [SPARK-8765] [MLLIB] Fix PySpark PowerIterationClustering test issue · 0effe180
      Yanbo Liang authored
      PySpark PowerIterationClustering test failure due to bad demo data.
      If the data is small,  PowerIterationClustering will behavior indeterministic.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #7177 from yanboliang/spark-8765 and squashes the following commits:
      
      392ae54 [Yanbo Liang] fix model.assignments output
      5ec3f1e [Yanbo Liang] fix PySpark PowerIterationClustering test issue
      0effe180
    • Xiangrui Meng's avatar
      Revert "[SPARK-7212] [MLLIB] Add sequence learning flag" · 96c5eeec
      Xiangrui Meng authored
      This reverts commit 25f574eb. After speaking to some users and developers, we realized that FP-growth doesn't meet the requirement for frequent sequence mining. PrefixSpan (SPARK-6487) would be the correct algorithm for it. feynmanliang
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #7240 from mengxr/SPARK-7212.revert and squashes the following commits:
      
      2b3d66b [Xiangrui Meng] Revert "[SPARK-7212] [MLLIB] Add sequence learning flag"
      96c5eeec
    • Ankur Chauhan's avatar
      [SPARK-6707] [CORE] [MESOS] Mesos Scheduler should allow the user to specify... · 1165b17d
      Ankur Chauhan authored
      [SPARK-6707] [CORE] [MESOS] Mesos Scheduler should allow the user to specify constraints based on slave attributes
      
      Currently, the mesos scheduler only looks at the 'cpu' and 'mem' resources when trying to determine the usablility of a resource offer from a mesos slave node. It may be preferable for the user to be able to ensure that the spark jobs are only started on a certain set of nodes (based on attributes).
      
      For example, If the user sets a property, let's say `spark.mesos.constraints` is set to `tachyon=true;us-east-1=false`, then the resource offers will be checked to see if they meet both these constraints and only then will be accepted to start new executors.
      
      Author: Ankur Chauhan <achauhan@brightcove.com>
      
      Closes #5563 from ankurcha/mesos_attribs and squashes the following commits:
      
      902535b [Ankur Chauhan] Fix line length
      d83801c [Ankur Chauhan] Update code as per code review comments
      8b73f2d [Ankur Chauhan] Fix imports
      c3523e7 [Ankur Chauhan] Added docs
      1a24d0b [Ankur Chauhan] Expand scope of attributes matching to include all data types
      482fd71 [Ankur Chauhan] Update access modifier to private[this] for offer constraints
      5ccc32d [Ankur Chauhan] Fix nit pick whitespace
      1bce782 [Ankur Chauhan] Fix nit pick whitespace
      c0cbc75 [Ankur Chauhan] Use offer id value for debug message
      7fee0ea [Ankur Chauhan] Add debug statements
      fc7eb5b [Ankur Chauhan] Fix import codestyle
      00be252 [Ankur Chauhan] Style changes as per code review comments
      662535f [Ankur Chauhan] Incorporate code review comments + use SparkFunSuite
      fdc0937 [Ankur Chauhan] Decline offers that did not meet criteria
      67b58a0 [Ankur Chauhan] Add documentation for spark.mesos.constraints
      63f53f4 [Ankur Chauhan] Update codestyle - uniform style for config values
      02031e4 [Ankur Chauhan] Fix scalastyle warnings in tests
      c09ed84 [Ankur Chauhan] Fixed the access modifier on offerConstraints val to private[mesos]
      0c64df6 [Ankur Chauhan] Rename overhead fractions to memory_*, fix spacing
      8cc1e8f [Ankur Chauhan] Make exception message more explicit about the source of the error
      addedba [Ankur Chauhan] Added test case for malformed constraint string
      ec9d9a6 [Ankur Chauhan] Add tests for parse constraint string
      72fe88a [Ankur Chauhan] Fix up tests + remove redundant method override, combine utility class into new mesos scheduler util trait
      92b47fd [Ankur Chauhan] Add attributes based constraints support to MesosScheduler
      1165b17d
    • Wisely Chen's avatar
      [SPARK-8656] [WEBUI] Fix the webUI and JSON API number is not synced · 9ff20334
      Wisely Chen authored
      Spark standalone master web UI show "Alive Workers" total core, total used cores and "Alive workers" total memory, memory used.
      But the JSON API page "http://MASTERURL:8088/json" shows "ALL workers"  core, memory number.
      This webUI data is not sync with the JSON API.
      The proper way is to sync the number with webUI and JSON API.
      
      Author: Wisely Chen <wiselychen@appier.com>
      
      Closes #7038 from thegiive/SPARK-8656 and squashes the following commits:
      
      9e54bf0 [Wisely Chen] Change variable name to camel case
      2c8ea89 [Wisely Chen] Change some styling and add local variable
      431d2b0 [Wisely Chen] Worker List should contain DEAD node also
      8b3b8e8 [Wisely Chen] [SPARK-8656] Fix the webUI and JSON API number is not synced
      9ff20334
    • Daoyuan Wang's avatar
      [MINOR] [SQL] remove unused code in Exchange · 132e7fca
      Daoyuan Wang authored
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      
      Closes #7234 from adrian-wang/exchangeclean and squashes the following commits:
      
      b093ec9 [Daoyuan Wang] remove unused code
      132e7fca
    • kai's avatar
      [SPARK-4485] [SQL] 1) Add broadcast hash outer join, (2) Fix SparkPlanTest · 2471c0bf
      kai authored
      This pull request
      (1) extracts common functions used by hash outer joins and put it in interface HashOuterJoin
      (2) adds ShuffledHashOuterJoin and BroadcastHashOuterJoin
      (3) adds test cases for shuffled and broadcast hash outer join
      (3) makes SparkPlanTest to support binary or more complex operators, and fixes bugs in plan composition in SparkPlanTest
      
      Author: kai <kaizeng@eecs.berkeley.edu>
      
      Closes #7162 from kai-zeng/outer and squashes the following commits:
      
      3742359 [kai] Fix not-serializable exception for code-generated keys in broadcasted relations
      14e4bf8 [kai] Use CanBroadcast in broadcast outer join planning
      dc5127e [kai] code style fixes
      b5a4efa [kai] (1) Add broadcast hash outer join, (2) Fix SparkPlanTest
      2471c0bf
    • Davies Liu's avatar
      [SPARK-8784] [SQL] Add Python API for hex and unhex · 37e4d921
      Davies Liu authored
      Add Python API for hex/unhex,  also cleanup Hex/Unhex
      
      Author: Davies Liu <davies@databricks.com>
      
      Closes #7223 from davies/hex and squashes the following commits:
      
      6f1249d [Davies Liu] no explicit rule to cast string into binary
      711a6ed [Davies Liu] fix test
      f9fe5a3 [Davies Liu] Merge branch 'master' of github.com:apache/spark into hex
      f032fbb [Davies Liu] Merge branch 'hex' of github.com:davies/spark into hex
      49e325f [Davies Liu] Merge branch 'master' of github.com:apache/spark into hex
      b31fc9a [Davies Liu] Update math.scala
      25156b7 [Davies Liu] address comments and fix test
      c3af78c [Davies Liu] address commments
      1a24082 [Davies Liu] Add Python API for hex and unhex
      37e4d921
    • Dirceu Semighini Filho's avatar
      Small update in the readme file · 57c72fcc
      Dirceu Semighini Filho authored
      Just change the attribute from -PsparkR to -Psparkr
      
      Author: Dirceu Semighini Filho <dirceu.semighini@gmail.com>
      
      Closes #7242 from dirceusemighini/patch-1 and squashes the following commits:
      
      fad5991 [Dirceu Semighini Filho] Small update in the readme file
      57c72fcc
    • Wenchen Fan's avatar
      [SPARK-8837][SPARK-7114][SQL] support using keyword in column name · 0e194645
      Wenchen Fan authored
      Author: Wenchen Fan <cloud0fan@outlook.com>
      
      Closes #7237 from cloud-fan/parser and squashes the following commits:
      
      e7b49bb [Wenchen Fan] support using keyword in column name
      0e194645
    • Daniel Emaasit (PhD Student)'s avatar
      [SPARK-8124] [SPARKR] Created more examples on SparkR DataFrames · 293225e0
      Daniel Emaasit (PhD Student) authored
      Here are more examples on SparkR DataFrames including creating a Spark Contect and a SQL
      context, loading data and simple data manipulation.
      
      Author: Daniel Emaasit (PhD Student) <daniel.emaasit@gmail.com>
      
      Closes #6668 from Emaasit/dan-dev and squashes the following commits:
      
      3a97867 [Daniel Emaasit (PhD Student)] Used fewer rows for createDataFrame
      f7227f9 [Daniel Emaasit (PhD Student)] Using command line arguments
      a550f70 [Daniel Emaasit (PhD Student)] Used base R functions
      33f9882 [Daniel Emaasit (PhD Student)] Renamed file
      b6603e3 [Daniel Emaasit (PhD Student)] changed "Describe" function to "describe"
      90565dd [Daniel Emaasit (PhD Student)] Deleted the getting-started file
      b95a103 [Daniel Emaasit (PhD Student)] Deleted this file
      cc55cd8 [Daniel Emaasit (PhD Student)] combined all the code into one .R file
      c6933af [Daniel Emaasit (PhD Student)] changed variable name to SQLContext
      8e0fe14 [Daniel Emaasit (PhD Student)] provided two options for creating DataFrames
      2653573 [Daniel Emaasit (PhD Student)] Updates to a comment and variable name
      275b787 [Daniel Emaasit (PhD Student)] Added the Apache License at the top of the file
      2e8f724 [Daniel Emaasit (PhD Student)] Added the Apache License at the top of the file
      486f44e [Daniel Emaasit (PhD Student)] Added the Apache License at the file
      d705112 [Daniel Emaasit (PhD Student)] Created more examples on SparkR DataFrames
      293225e0
    • Steve Lindemann's avatar
      [SPARK-8841] [SQL] Fix partition pruning percentage log message · 39e4e7e4
      Steve Lindemann authored
      When pruning partitions for a query plan, a message is logged indicating what how many partitions were selected based on predicate criteria, and what percent were pruned.
      
      The current release erroneously uses `1 - total/selected` to compute this quantity, leading to nonsense messages like "pruned -1000% partitions". The fix is simple and obvious.
      
      Author: Steve Lindemann <steve.lindemann@engineersgatelp.com>
      
      Closes #7227 from srlindemann/master and squashes the following commits:
      
      c788061 [Steve Lindemann] fix percentPruned log message
      39e4e7e4
    • Reynold Xin's avatar
      [SPARK-8831][SQL] Support AbstractDataType in TypeCollection. · 86768b7b
      Reynold Xin authored
      Otherwise it is impossible to declare an expression supporting DecimalType.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #7232 from rxin/typecollection-adt and squashes the following commits:
      
      934d3d1 [Reynold Xin] [SPARK-8831][SQL] Support AbstractDataType in TypeCollection.
      86768b7b
  3. Jul 05, 2015
    • Cheng Hao's avatar
      [SQL][Minor] Update the DataFrame API for encode/decode · 6d0411b4
      Cheng Hao authored
      This is a the follow up of #6843.
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #7230 from chenghao-intel/str_funcs2_followup and squashes the following commits:
      
      52cc553 [Cheng Hao] update the code as comment
      6d0411b4
    • Yu ISHIKAWA's avatar
      [SPARK-8549] [SPARKR] Fix the line length of SparkR · a0cb111b
      Yu ISHIKAWA authored
      [[SPARK-8549] Fix the line length of SparkR - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-8549)
      
      Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
      
      Closes #7204 from yu-iskw/SPARK-8549 and squashes the following commits:
      
      6fb131a [Yu ISHIKAWA] Fix the typo
      1737598 [Yu ISHIKAWA] [SPARK-8549][SparkR] Fix the line length of SparkR
      a0cb111b
    • Joshi's avatar
      [SPARK-7137] [ML] Update SchemaUtils checkInputColumn to print more info if needed · f9c448dc
      Joshi authored
      Author: Joshi <rekhajoshm@gmail.com>
      Author: Rekha Joshi <rekhajoshm@gmail.com>
      
      Closes #5992 from rekhajoshm/fix/SPARK-7137 and squashes the following commits:
      
      8c42b57 [Joshi] update checkInputColumn to print more info if needed
      33ddd2e [Joshi] update checkInputColumn to print more info if needed
      acf3e17 [Joshi] update checkInputColumn to print more info if needed
      8993c0e [Joshi] SPARK-7137: Add checkInputColumn back to Params and print more info
      e3677c9 [Rekha Joshi] Merge pull request #1 from apache/master
      f9c448dc
    • Liang-Chi Hsieh's avatar
      [MINOR] [SQL] Minor fix for CatalystSchemaConverter · 2b820f2a
      Liang-Chi Hsieh authored
      ping liancheng
      
      Author: Liang-Chi Hsieh <viirya@gmail.com>
      
      Closes #7224 from viirya/few_fix_catalystschema and squashes the following commits:
      
      d994330 [Liang-Chi Hsieh] Minor fix for CatalystSchemaConverter.
      2b820f2a
  4. Jul 04, 2015
    • Reynold Xin's avatar
      [SPARK-8822][SQL] clean up type checking in math.scala. · c991ef5a
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #7220 from rxin/SPARK-8822 and squashes the following commits:
      
      0cda076 [Reynold Xin] Test cases.
      22d0463 [Reynold Xin] Fixed type precedence.
      beb2a97 [Reynold Xin] [SPARK-8822][SQL] clean up type checking in math.scala.
      c991ef5a
    • Reynold Xin's avatar
      [SQL] More unit tests for implicit type cast & add simpleString to AbstractDataType. · 347cab85
      Reynold Xin authored
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #7221 from rxin/implicit-cast-tests and squashes the following commits:
      
      64b13bd [Reynold Xin] Fixed a bug ..
      489b732 [Reynold Xin] [SQL] More unit tests for implicit type cast & add simpleString to AbstractDataType.
      347cab85
    • Reynold Xin's avatar
      48f7aed6
    • Tarek Auel's avatar
      [SPARK-8270][SQL] levenshtein distance · 6b3574e6
      Tarek Auel authored
      Jira: https://issues.apache.org/jira/browse/SPARK-8270
      
      Info: I can not build the latest master, it stucks during the build process: `[INFO] Dependency-reduced POM written at: /Users/tarek/test/spark/bagel/dependency-reduced-pom.xml`
      
      Author: Tarek Auel <tarek.auel@googlemail.com>
      
      Closes #7214 from tarekauel/SPARK-8270 and squashes the following commits:
      
      ab348b9 [Tarek Auel] Merge branch 'master' into SPARK-8270
      a2ad318 [Tarek Auel] [SPARK-8270] changed order of fields
      d91b12c [Tarek Auel] [SPARK-8270] python fix
      adbd075 [Tarek Auel] [SPARK-8270] fixed typo
      23185c9 [Tarek Auel] [SPARK-8270] levenshtein distance
      6b3574e6
    • Cheng Hao's avatar
      [SPARK-8238][SPARK-8239][SPARK-8242][SPARK-8243][SPARK-8268][SQL]Add... · f35b0c34
      Cheng Hao authored
      [SPARK-8238][SPARK-8239][SPARK-8242][SPARK-8243][SPARK-8268][SQL]Add ascii/base64/unbase64/encode/decode functions
      
      Add `ascii`,`base64`,`unbase64`,`encode` and `decode` expressions.
      
      Author: Cheng Hao <hao.cheng@intel.com>
      
      Closes #6843 from chenghao-intel/str_funcs2 and squashes the following commits:
      
      78dee7d [Cheng Hao] base 64 -> base64
      9d6f9f4 [Cheng Hao] remove the toString method for expressions
      ed5c19c [Cheng Hao] update code as comments
      96170fc [Cheng Hao] scalastyle issues
      e2df768 [Cheng Hao] remove the unused import
      491ce7b [Cheng Hao] add ascii/base64/unbase64/encode/decode functions
      f35b0c34
    • Josh Rosen's avatar
      [SPARK-8777] [SQL] Add random data generator test utilities to Spark SQL · f32487b7
      Josh Rosen authored
      This commit adds a set of random data generation utilities to Spark SQL, for use in its own unit tests.
      
      - `RandomDataGenerator.forType(DataType)` returns an `Option[() => Any]` that, if defined, contains a function for generating random values for the given DataType.  The random values use the external representations for the given DataType (for example, for DateType we return `java.sql.Date` instances instead of longs).
      - `DateTypeTestUtilities` defines some convenience fields for looping over instances of data types.  For example, `numericTypes` holds `DataType` instances for all supported numeric types.  These constants will help us to raise the level of abstraction in our tests.  For example, it's now very easy to write a test which is parameterized by all common data types.
      
      Author: Josh Rosen <joshrosen@databricks.com>
      
      Closes #7176 from JoshRosen/sql-random-data-generators and squashes the following commits:
      
      f71634d [Josh Rosen] Roll back ScalaCheck usage
      e0d7d49 [Josh Rosen] Bump ScalaCheck version in LICENSE
      89d86b1 [Josh Rosen] Bump ScalaCheck version.
      0c20905 [Josh Rosen] Initial attempt at using ScalaCheck.
      b55875a [Josh Rosen] Generate doubles and floats over entire possible range.
      5acdd5c [Josh Rosen] Infinity and NaN are interesting.
      ab76cbd [Josh Rosen] Move code to Catalyst package.
      d2b4a4a [Josh Rosen] Add random data generator test utilities to Spark SQL.
      f32487b7
    • Daoyuan Wang's avatar
      [SPARK-8192] [SPARK-8193] [SQL] udf current_date, current_timestamp · 9fb6b832
      Daoyuan Wang authored
      Author: Daoyuan Wang <daoyuan.wang@intel.com>
      
      Closes #6985 from adrian-wang/udfcurrent and squashes the following commits:
      
      6a20b64 [Daoyuan Wang] remove codegen and add lazy in testsuite
      27c9f95 [Daoyuan Wang] refine tests..
      e11ae75 [Daoyuan Wang] refine tests
      61ed3d5 [Daoyuan Wang] add in functions
      98e8550 [Daoyuan Wang] fix sytle
      427d9dc [Daoyuan Wang] add tests and codegen
      0b69a1f [Daoyuan Wang] udf current
      9fb6b832
Loading