- Mar 23, 2017
-
-
sureshthalamati authored
[SPARK-10849][SQL] Adds option to the JDBC data source write for user to specify database column type for the create table ## What changes were proposed in this pull request? Currently JDBC data source creates tables in the target database using the default type mapping, and the JDBC dialect mechanism. If users want to specify different database data type for only some of columns, there is no option available. In scenarios where default mapping does not work, users are forced to create tables on the target database before writing. This workaround is probably not acceptable from a usability point of view. This PR is to provide a user-defined type mapping for specific columns. The solution is to allow users to specify database column data type for the create table as JDBC datasource option(createTableColumnTypes) on write. Data type information can be specified in the same format as table schema DDL format (e.g: `name CHAR(64), comments VARCHAR(1024)`). All supported target database types can not be specified , the data types has to be valid spark sql data types also. For example user can not specify target database CLOB data type. This will be supported in the follow-up PR. Example: ```Scala df.write .option("createTableColumnTypes", "name CHAR(64), comments VARCHAR(1024)") .jdbc(url, "TEST.DBCOLTYPETEST", properties) ``` ## How was this patch tested? Added new test cases to the JDBCWriteSuite Author: sureshthalamati <suresh.thalamati@gmail.com> Closes #16209 from sureshthalamati/jdbc_custom_dbtype_option_json-spark-10849.
-
- Mar 02, 2017
-
-
Nick Pentreath authored
[SPARK-14489](https://issues.apache.org/jira/browse/SPARK-14489) added the ability to skip `NaN` predictions during `ALSModel.transform`. This PR adds documentation for the `coldStartStrategy` param to the ALS user guide, and add code to the examples to illustrate usage. ## How was this patch tested? Doc and example change only. Build HTML doc locally and verified example code builds, and runs in shell for Scala/Python. Author: Nick Pentreath <nickp@za.ibm.com> Closes #17102 from MLnick/SPARK-19345-coldstart-doc.
-
Yun Ni authored
## What changes were proposed in this pull request? Remove `org.apache.spark.examples.` in Add slash in one of the python doc. ## How was this patch tested? Run examples using the commands in the comments. Author: Yun Ni <yunn@uber.com> Closes #17104 from Yunni/yunn_minor.
-
- Mar 01, 2017
-
-
wm624@hotmail.com authored
[SPARK-19460][SPARKR] Update dataset used in R documentation, examples to reduce warning noise and confusions ## What changes were proposed in this pull request? Replace `iris` dataset with `Titanic` or other dataset in example and document. ## How was this patch tested? Manual and existing test Author: wm624@hotmail.com <wm624@hotmail.com> Closes #17032 from wangmiao1981/example.
-
- Feb 27, 2017
-
-
hyukjinkwon authored
## What changes were proposed in this pull request? This PR proposes to replace the deprecated `json(RDD[String])` usage to `json(Dataset[String])`. This currently produces so many warnings. ## How was this patch tested? Fixed tests. Author: hyukjinkwon <gurwls223@gmail.com> Closes #17071 from HyukjinKwon/SPARK-15615-followup.
-
hyukjinkwon authored
## What changes were proposed in this pull request? This PR proposes to fix the lint-breaks as below: ``` [ERROR] src/test/java/org/apache/spark/network/TransportResponseHandlerSuite.java:[29,8] (imports) UnusedImports: Unused import - org.apache.spark.network.buffer.ManagedBuffer. [ERROR] src/main/java/org/apache/spark/unsafe/types/UTF8String.java:[156,10] (modifier) ModifierOrder: 'Nonnull' annotation modifier does not precede non-annotation modifiers. [ERROR] src/main/java/org/apache/spark/SparkFirehoseListener.java:[122] (sizes) LineLength: Line is longer than 100 characters (found 105). [ERROR] src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java:[164,78] (coding) OneStatementPerLine: Only one statement per line allowed. [ERROR] src/test/java/test/org/apache/spark/JavaAPISuite.java:[1157] (sizes) LineLength: Line is longer than 100 characters (found 121). [ERROR] src/test/java/org/apache/spark/streaming/JavaMapWithStateSuite.java:[149] (sizes) LineLength: Line is longer than 100 characters (found 113). [ERROR] src/test/java/test/org/apache/spark/streaming/Java8APISuite.java:[146] (sizes) LineLength: Line is longer than 100 characters (found 122). [ERROR] src/test/java/test/org/apache/spark/streaming/JavaAPISuite.java:[32,8] (imports) UnusedImports: Unused import - org.apache.spark.streaming.Time. [ERROR] src/test/java/test/org/apache/spark/streaming/JavaAPISuite.java:[611] (sizes) LineLength: Line is longer than 100 characters (found 101). [ERROR] src/test/java/test/org/apache/spark/streaming/JavaAPISuite.java:[1317] (sizes) LineLength: Line is longer than 100 characters (found 102). [ERROR] src/test/java/test/org/apache/spark/sql/JavaDatasetAggregatorSuite.java:[91] (sizes) LineLength: Line is longer than 100 characters (found 102). [ERROR] src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java:[113] (sizes) LineLength: Line is longer than 100 characters (found 101). [ERROR] src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java:[164] (sizes) LineLength: Line is longer than 100 characters (found 110). [ERROR] src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java:[212] (sizes) LineLength: Line is longer than 100 characters (found 114). [ERROR] src/test/java/org/apache/spark/mllib/tree/JavaDecisionTreeSuite.java:[36] (sizes) LineLength: Line is longer than 100 characters (found 101). [ERROR] src/main/java/org/apache/spark/examples/streaming/JavaKinesisWordCountASL.java:[26,8] (imports) UnusedImports: Unused import - com.amazonaws.regions.RegionUtils. [ERROR] src/test/java/org/apache/spark/streaming/kinesis/JavaKinesisStreamSuite.java:[20,8] (imports) UnusedImports: Unused import - com.amazonaws.regions.RegionUtils. [ERROR] src/test/java/org/apache/spark/streaming/kinesis/JavaKinesisStreamSuite.java:[94] (sizes) LineLength: Line is longer than 100 characters (found 103). [ERROR] src/main/java/org/apache/spark/examples/ml/JavaTokenizerExample.java:[30,8] (imports) UnusedImports: Unused import - org.apache.spark.sql.api.java.UDF1. [ERROR] src/main/java/org/apache/spark/examples/ml/JavaTokenizerExample.java:[72] (sizes) LineLength: Line is longer than 100 characters (found 104). [ERROR] src/main/java/org/apache/spark/examples/mllib/JavaRankingMetricsExample.java:[121] (sizes) LineLength: Line is longer than 100 characters (found 101). [ERROR] src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java:[28,8] (imports) UnusedImports: Unused import - org.apache.spark.api.java.JavaRDD. [ERROR] src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java:[29,8] (imports) UnusedImports: Unused import - org.apache.spark.api.java.JavaSparkContext. ``` ## How was this patch tested? Manually via ```bash ./dev/lint-java ``` Author: hyukjinkwon <gurwls223@gmail.com> Closes #17072 from HyukjinKwon/java-lint.
-
- Feb 25, 2017
-
-
Boaz Mohar authored
## What changes were proposed in this pull request? Removed duplicated lines in sql python example and found a typo. ## How was this patch tested? Searched for other typo's in the page to minimize PR's. Author: Boaz Mohar <boazmohar@gmail.com> Closes #17066 from boazmohar/doc-fix.
-
- Feb 21, 2017
-
-
Yuhao Yang authored
## What changes were proposed in this pull request? Documentation and examples (Java, scala, python, R) for LinearSVC ## How was this patch tested? local doc generation Author: Yuhao Yang <yuhao.yang@intel.com> Closes #16968 from hhbyyh/mlsvmdoc.
-
- Feb 19, 2017
-
-
Sean Owen authored
## What changes were proposed in this pull request? Convert Java tests to use lambdas, Java 8 features. ## How was this patch tested? Jenkins tests. Author: Sean Owen <sowen@cloudera.com> Closes #16961 from srowen/SPARK-19533.
-
- Feb 17, 2017
-
-
wm624@hotmail.com authored
## What changes were proposed in this pull request? We recently add the spark.svmLinear API for SparkR. We need to add an example and update the vignettes. ## How was this patch tested? Manually run example. Author: wm624@hotmail.com <wm624@hotmail.com> Closes #16969 from wangmiao1981/example.
-
Felix Cheung authored
## What changes were proposed in this pull request? stop session at end of example ## How was this patch tested? manual Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #16973 from felixcheung/rexamples.
-
- Feb 16, 2017
-
-
Sean Owen authored
- Move external/java8-tests tests into core, streaming, sql and remove - Remove MaxPermGen and related options - Fix some reflection / TODOs around Java 8+ methods - Update doc references to 1.7/1.8 differences - Remove Java 7/8 related build profiles - Update some plugins for better Java 8 compatibility - Fix a few Java-related warnings For the future: - Update Java 8 examples to fully use Java 8 - Update Java tests to use lambdas for simplicity - Update Java internal implementations to use lambdas ## How was this patch tested? Existing tests Author: Sean Owen <sowen@cloudera.com> Closes #16871 from srowen/SPARK-19493.
-
- Feb 15, 2017
-
-
Yun Ni authored
## What changes were proposed in this pull request? This pull request includes python API and examples for LSH. The API changes was based on yanboliang 's PR #15768 and resolved conflicts and API changes on the Scala API. The examples are consistent with Scala examples of MinHashLSH and BucketedRandomProjectionLSH. ## How was this patch tested? API and examples are tested using spark-submit: `bin/spark-submit examples/src/main/python/ml/min_hash_lsh.py` `bin/spark-submit examples/src/main/python/ml/bucketed_random_projection_lsh.py` User guide changes are generated and manually inspected: `SKIP_API=1 jekyll build` Author: Yun Ni <yunn@uber.com> Author: Yanbo Liang <ybliang8@gmail.com> Author: Yunni <Euler57721@gmail.com> Closes #16715 from Yunni/spark-18080.
-
- Feb 13, 2017
-
-
Xiao Li authored
[SPARK-19574][ML][DOCUMENTATION] Fix Liquid Exception: Start indices amount is not equal to end indices amount ### What changes were proposed in this pull request? ``` Liquid Exception: Start indices amount is not equal to end indices amount, see /Users/xiao/IdeaProjects/sparkDelivery/docs/../examples/src/main/java/org/apache/spark/examples/ml/JavaTokenizerExample.java. in ml-features.md ``` So far, the build is broken after merging https://github.com/apache/spark/pull/16789 This PR is to fix it. ## How was this patch tested? Manual Author: Xiao Li <gatorsmile@gmail.com> Closes #16908 from gatorsmile/docMLFix.
-
- Feb 07, 2017
-
-
Aseem Bansal authored
## What changes were proposed in this pull request? SPARK-19444 imports not being present in documentation ## How was this patch tested? Manual ## Disclaimer Contribution is original work and I license the work to the project under the project’s open source license Author: Aseem Bansal <anshbansal@users.noreply.github.com> Closes #16789 from anshbansal/patch-1.
-
- Feb 03, 2017
-
-
krishnakalyan3 authored
## What changes were proposed in this pull request? Update programming guide, example and vignette with Bisecting k-means. Author: krishnakalyan3 <krishnakalyan3@gmail.com> Closes #16767 from krishnakalyan3/bisecting-kmeans.
-
- Jan 25, 2017
-
-
aokolnychyi authored
## What changes were proposed in this pull request? - A separate subsection for Aggregations under “Getting Started” in the Spark SQL programming guide. It mentions which aggregate functions are predefined and how users can create their own. - Examples of using the `UserDefinedAggregateFunction` abstract class for untyped aggregations in Java and Scala. - Examples of using the `Aggregator` abstract class for type-safe aggregations in Java and Scala. - Python is not covered. - The PR might not resolve the ticket since I do not know what exactly was planned by the author. In total, there are four new standalone examples that can be executed via `spark-submit` or `run-example`. The updated Spark SQL programming guide references to these examples and does not contain hard-coded snippets. ## How was this patch tested? The patch was tested locally by building the docs. The examples were run as well.  Author: aokolnychyi <okolnychyyanton@gmail.com> Closes #16329 from aokolnychyi/SPARK-16046.
-
- Jan 18, 2017
-
-
uncleGen authored
## What changes were proposed in this pull request? remove ununsed imports and outdated comments, and fix some minor code style issue. ## How was this patch tested? existing ut Author: uncleGen <hustyugm@gmail.com> Closes #16591 from uncleGen/SPARK-19227.
-
- Jan 12, 2017
-
-
Yanbo Liang authored
## What changes were proposed in this pull request? ```ml.R``` example depends on ```e1071``` package, if it's not available in users' environment, it will fail. I think the example should not depends on third-party packages, so I update it to remove the dependency. ## How was this patch tested? Manual test. Author: Yanbo Liang <ybliang8@gmail.com> Closes #16548 from yanboliang/spark-19158.
-
- Jan 10, 2017
-
-
hyukjinkwon authored
## What changes were proposed in this pull request? **binary_classification_metrics_example.py** LibSVM datasource loads `ml.linalg.SparseVector` whereas the example requires it to be `mllib.linalg.SparseVector`. For the equivalent Scala exmaple, `BinaryClassificationMetricsExample.scala` seems fine. ``` ./bin/spark-submit examples/src/main/python/mllib/binary_classification_metrics_example.py ``` ``` File ".../spark/examples/src/main/python/mllib/binary_classification_metrics_example.py", line 39, in <lambda> .rdd.map(lambda row: LabeledPoint(row[0], row[1])) File ".../spark/python/pyspark/mllib/regression.py", line 54, in __init__ self.features = _convert_to_vector(features) File ".../spark/python/pyspark/mllib/linalg/__init__.py", line 80, in _convert_to_vector raise TypeError("Cannot convert type %s into Vector" % type(l)) TypeError: Cannot convert type <class 'pyspark.ml.linalg.SparseVector'> into Vector ``` **status_api_demo.py** (this one does not work on Python 3.4.6) It's `queue` in Python 3+. ``` PYSPARK_PYTHON=python3 ./bin/spark-submit examples/src/main/python/status_api_demo.py ``` ``` Traceback (most recent call last): File ".../spark/examples/src/main/python/status_api_demo.py", line 22, in <module> import Queue ImportError: No module named 'Queue' ``` **bisecting_k_means_example.py** `BisectingKMeansModel` does not implement `save` and `load` in Python. ```bash ./bin/spark-submit examples/src/main/python/mllib/bisecting_k_means_example.py ``` ``` Traceback (most recent call last): File ".../spark/examples/src/main/python/mllib/bisecting_k_means_example.py", line 46, in <module> model.save(sc, path) AttributeError: 'BisectingKMeansModel' object has no attribute 'save' ``` **elementwise_product_example.py** It calls `collect` from the vector. ```bash ./bin/spark-submit examples/src/main/python/mllib/elementwise_product_example.py ``` ``` Traceback (most recent call last): File ".../spark/examples/src/main/python/mllib/elementwise_product_example.py", line 48, in <module> for each in transformedData2.collect(): File ".../spark/python/pyspark/mllib/linalg/__init__.py", line 478, in __getattr__ return getattr(self.array, item) AttributeError: 'numpy.ndarray' object has no attribute 'collect' ``` **These three tests look throwing an exception for a relative path set in `spark.sql.warehouse.dir`.** **hive.py** ``` ./bin/spark-submit examples/src/main/python/sql/hive.py ``` ``` Traceback (most recent call last): File ".../spark/examples/src/main/python/sql/hive.py", line 47, in <module> spark.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING) USING hive") File ".../spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 541, in sql File ".../spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__ File ".../spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 69, in deco pyspark.sql.utils.AnalysisException: 'org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:./spark-warehouse);' ``` **SparkHiveExample.scala** ``` ./bin/run-example sql.hive.SparkHiveExample ``` ``` Exception in thread "main" org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:./spark-warehouse at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:498) at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:484) at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1668) ``` **JavaSparkHiveExample.java** ``` ./bin/run-example sql.hive.JavaSparkHiveExample ``` ``` Exception in thread "main" org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:./spark-warehouse at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:498) at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:484) at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1668) ``` ## How was this patch tested? Manually via ``` ./bin/spark-submit examples/src/main/python/mllib/binary_classification_metrics_example.py ``` ``` PYSPARK_PYTHON=python3 ./bin/spark-submit examples/src/main/python/status_api_demo.py ``` ``` ./bin/spark-submit examples/src/main/python/mllib/bisecting_k_means_example.py ``` ``` ./bin/spark-submit examples/src/main/python/mllib/elementwise_product_example.py ``` ``` ./bin/spark-submit examples/src/main/python/sql/hive.py ``` ``` ./bin/run-example sql.hive.JavaSparkHiveExample ``` ``` ./bin/run-example sql.hive.SparkHiveExample ``` These were found via ```bash find ./examples/src/main/python -name "*.py" -exec spark-submit {} \; ``` Author: hyukjinkwon <gurwls223@gmail.com> Closes #16515 from HyukjinKwon/minor-example-fix.
-
- Jan 05, 2017
-
-
Wenchen Fan authored
## What changes were proposed in this pull request? Today we have different syntax to create data source or hive serde tables, we should unify them to not confuse users and step forward to make hive a data source. Please read https://issues.apache.org/jira/secure/attachment/12843835/CREATE-TABLE.pdf for details. TODO(for follow-up PRs): 1. TBLPROPERTIES is not added to the new syntax, we should decide if we wanna add it later. 2. `SHOW CREATE TABLE` should be updated to use the new syntax. 3. we should decide if we wanna change the behavior of `SET LOCATION`. ## How was this patch tested? new tests Author: Wenchen Fan <wenchen@databricks.com> Closes #16296 from cloud-fan/create-table.
-
- Jan 04, 2017
-
-
Niranjan Padmanabhan authored
## What changes were proposed in this pull request? There are many locations in the Spark repo where the same word occurs consecutively. Sometimes they are appropriately placed, but many times they are not. This PR removes the inappropriately duplicated words. ## How was this patch tested? N/A since only docs or comments were updated. Author: Niranjan Padmanabhan <niranjan.padmanabhan@gmail.com> Closes #16455 from neurons/np.structure_streaming_doc.
-
- Jan 03, 2017
-
-
Weiqing Yang authored
## What changes were proposed in this pull request? Add `finally` clause for `sc.stop()` in the `test("register and deregister Spark listener from SparkContext")`. ## How was this patch tested? Pass the build and unit tests. Author: Weiqing Yang <yangweiqing001@gmail.com> Closes #16426 from weiqingy/testIssue.
-
- Jan 02, 2017
-
-
hyukjinkwon authored
## What changes were proposed in this pull request? This PR proposes to check pep8 against all other Python scripts and fix the errors as below: ```bash ./dev/create-release/generate-contributors.py ./dev/create-release/releaseutils.py ./dev/create-release/translate-contributors.py ./dev/lint-python ./python/docs/epytext.py ./examples/src/main/python/mllib/decision_tree_classification_example.py ./examples/src/main/python/mllib/decision_tree_regression_example.py ./examples/src/main/python/mllib/gradient_boosting_classification_example.py ./examples/src/main/python/mllib/gradient_boosting_regression_example.py ./examples/src/main/python/mllib/linear_regression_with_sgd_example.py ./examples/src/main/python/mllib/logistic_regression_with_lbfgs_example.py ./examples/src/main/python/mllib/naive_bayes_example.py ./examples/src/main/python/mllib/random_forest_classification_example.py ./examples/src/main/python/mllib/random_forest_regression_example.py ./examples/src/main/python/mllib/svm_with_sgd_example.py ./examples/src/main/python/streaming/network_wordjoinsentiments.py ./sql/hive/src/test/resources/data/scripts/cat.py ./sql/hive/src/test/resources/data/scripts/cat_error.py ./sql/hive/src/test/resources/data/scripts/doubleescapedtab.py ./sql/hive/src/test/resources/data/scripts/dumpdata_script.py ./sql/hive/src/test/resources/data/scripts/escapedcarriagereturn.py ./sql/hive/src/test/resources/data/scripts/escapednewline.py ./sql/hive/src/test/resources/data/scripts/escapedtab.py ./sql/hive/src/test/resources/data/scripts/input20_script.py ./sql/hive/src/test/resources/data/scripts/newline.py ``` ## How was this patch tested? - `./python/docs/epytext.py` ```bash cd ./python/docs $$ make html ``` - pep8 check (Python 2.7 / Python 3.3.6) ``` ./dev/lint-python ``` - `./dev/merge_spark_pr.py` (Python 2.7 only / Python 3.3.6 not working) ```bash python -m doctest -v ./dev/merge_spark_pr.py ``` - `./dev/create-release/releaseutils.py` `./dev/create-release/generate-contributors.py` `./dev/create-release/translate-contributors.py` (Python 2.7 only / Python 3.3.6 not working) ```bash python generate-contributors.py python translate-contributors.py ``` - Examples (Python 2.7 / Python 3.3.6) ```bash ./bin/spark-submit examples/src/main/python/mllib/decision_tree_classification_example.py ./bin/spark-submit examples/src/main/python/mllib/decision_tree_regression_example.py ./bin/spark-submit examples/src/main/python/mllib/gradient_boosting_classification_example.py ./bin/spark-submit examples/src/main/python/mllib/gradient_boosting_regression_example.p ./bin/spark-submit examples/src/main/python/mllib/random_forest_classification_example.py ./bin/spark-submit examples/src/main/python/mllib/random_forest_regression_example.py ``` - Examples (Python 2.7 only / Python 3.3.6 not working) ``` ./bin/spark-submit examples/src/main/python/mllib/linear_regression_with_sgd_example.py ./bin/spark-submit examples/src/main/python/mllib/logistic_regression_with_lbfgs_example.py ./bin/spark-submit examples/src/main/python/mllib/naive_bayes_example.py ./bin/spark-submit examples/src/main/python/mllib/svm_with_sgd_example.py ``` - `sql/hive/src/test/resources/data/scripts/*.py` (Python 2.7 / Python 3.3.6 within suggested changes) Manually tested only changed ones. - `./dev/github_jira_sync.py` (Python 2.7 only / Python 3.3.6 not working) Manually tested this after disabling actually adding comments and links. And also via Jenkins tests. Author: hyukjinkwon <gurwls223@gmail.com> Closes #16405 from HyukjinKwon/minor-pep8.
-
- Dec 08, 2016
-
-
Yanbo Liang authored
## What changes were proposed in this pull request? * Add all R examples for ML wrappers which were added during 2.1 release cycle. * Split the whole ```ml.R``` example file into individual example for each algorithm, which will be convenient for users to rerun them. * Add corresponding examples to ML user guide. * Update ML section of SparkR user guide. Note: MLlib Scala/Java/Python examples will be consistent, however, SparkR examples may different from them, since R users may use the algorithms in a different way, for example, using R ```formula``` to specify ```featuresCol``` and ```labelCol```. ## How was this patch tested? Run all examples manually. Author: Yanbo Liang <ybliang8@gmail.com> Closes #16148 from yanboliang/spark-18325.
-
- Dec 07, 2016
-
-
wm624@hotmail.com authored
## What changes were proposed in this pull request? Logistic Regression summary is added in Python API. We need to add example and document for summary. The newly added example is consistent with Scala and Java examples. ## How was this patch tested? Manually tests: Run the example with spark-submit; copy & paste code into pyspark; build document and check the document. Author: wm624@hotmail.com <wm624@hotmail.com> Closes #16064 from wangmiao1981/py.
-
- Dec 03, 2016
-
-
Yunni authored
## What changes were proposed in this pull request? The user guide for LSH is added to ml-features.md, with several scala/java examples in spark-examples. ## How was this patch tested? Doc has been generated through Jekyll, and checked through manual inspection. Author: Yunni <Euler57721@gmail.com> Author: Yun Ni <yunn@uber.com> Author: Joseph K. Bradley <joseph@databricks.com> Author: Yun Ni <Euler57721@gmail.com> Closes #15795 from Yunni/SPARK-18081-lsh-guide.
-
- Dec 02, 2016
-
-
Reynold Xin authored
## What changes were proposed in this pull request? This patch bumps master branch version to 2.2.0-SNAPSHOT. ## How was this patch tested? N/A Author: Reynold Xin <rxin@databricks.com> Closes #16126 from rxin/SPARK-18695.
-
- Nov 16, 2016
-
-
Xianyang Liu authored
## What changes were proposed in this pull request? Small fix, fix the errors caused by lint check in Java - Clear unused objects and `UnusedImports`. - Add comments around the method `finalize` of `NioBufferedFileInputStream`to turn off checkstyle. - Cut the line which is longer than 100 characters into two lines. ## How was this patch tested? Travis CI. ``` $ build/mvn -T 4 -q -DskipTests -Pyarn -Phadoop-2.3 -Pkinesis-asl -Phive -Phive-thriftserver install $ dev/lint-java ``` Before: ``` Checkstyle checks failed at following occurrences: [ERROR] src/main/java/org/apache/spark/network/util/TransportConf.java:[21,8] (imports) UnusedImports: Unused import - org.apache.commons.crypto.cipher.CryptoCipherFactory. [ERROR] src/test/java/org/apache/spark/network/sasl/SparkSaslSuite.java:[516,5] (modifier) RedundantModifier: Redundant 'public' modifier. [ERROR] src/main/java/org/apache/spark/io/NioBufferedFileInputStream.java:[133] (coding) NoFinalizer: Avoid using finalizer method. [ERROR] src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeMapData.java:[71] (sizes) LineLength: Line is longer than 100 characters (found 113). [ERROR] src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java:[112] (sizes) LineLength: Line is longer than 100 characters (found 110). [ERROR] src/test/java/org/apache/spark/sql/catalyst/expressions/HiveHasherSuite.java:[31,17] (modifier) ModifierOrder: 'static' modifier out of order with the JLS suggestions. [ERROR]src/main/java/org/apache/spark/examples/ml/JavaLogisticRegressionWithElasticNetExample.java:[64] (sizes) LineLength: Line is longer than 100 characters (found 103). [ERROR] src/main/java/org/apache/spark/examples/ml/JavaInteractionExample.java:[22,8] (imports) UnusedImports: Unused import - org.apache.spark.ml.linalg.Vectors. [ERROR] src/main/java/org/apache/spark/examples/ml/JavaInteractionExample.java:[51] (regexp) RegexpSingleline: No trailing whitespace allowed. ``` After: ``` $ build/mvn -T 4 -q -DskipTests -Pyarn -Phadoop-2.3 -Pkinesis-asl -Phive -Phive-thriftserver install $ dev/lint-java Using `mvn` from path: /home/travis/build/ConeyLiu/spark/build/apache-maven-3.3.9/bin/mvn Checkstyle checks passed. ``` Author: Xianyang Liu <xyliu0530@icloud.com> Closes #15865 from ConeyLiu/master.
-
uncleGen authored
## What changes were proposed in this pull request? This PR provides structured kafka wordcount examples ## How was this patch tested? Author: uncleGen <hustyugm@gmail.com> Closes #15849 from uncleGen/SPARK-18410.
-
- Nov 15, 2016
-
-
Zheng RuiFeng authored
## What changes were proposed in this pull request? 1,Remove `runs` from docs of mllib.KMeans 2,Add notes for `k` according to comments in sources ## How was this patch tested? existing tests Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #15873 from zhengruifeng/update_doc_mllib_kmeans.
-
- Nov 10, 2016
-
-
wm624@hotmail.com authored
## What changes were proposed in this pull request? This is a follow-up work of #15618. Close file source; For any newly created streaming context outside the withContext, explicitly close the context. ## How was this patch tested? Existing unit tests. Author: wm624@hotmail.com <wm624@hotmail.com> Closes #15818 from wangmiao1981/rtest.
-
- Nov 08, 2016
-
-
chie8842 authored
I created Scala and Java example and added documentation. Author: chie8842 <hayashidac@nttdata.co.jp> Closes #15658 from hayashidac/SPARK-13770.
-
- Nov 02, 2016
-
-
Maria Rydzy authored
## What changes were proposed in this pull request? If my understanding is correct we should be rather looking at closed disk than the opened one. ## How was this patch tested? Run simple comparison, of the mean squared error of approaches with closed and opened disk. https://gist.github.com/mrydzy/1cf0e5c316ef9d6fbd91426b91f1969f The closed one performed slightly better, but the tested sample wasn't too big, so I rely mostly on the algorithm understanding. Author: Maria Rydzy <majrydzy+gh@gmail.com> Closes #15687 from mrydzy/master.
-
- Oct 28, 2016
-
-
Jagadeesan authored
## What changes were proposed in this pull request? In Python 3, there is only one integer type (i.e., int), which mostly behaves like the long type in Python 2. Since Python 3 won't accept "L", so removed "L" in all examples. ## How was this patch tested? Unit tests. …rrors] Author: Jagadeesan <as2@us.ibm.com> Closes #15660 from jagadeesanas2/SPARK-18133.
-
- Oct 26, 2016
-
-
Xin Ren authored
## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-14300 Duplicated code found in scala/examples/mllib, below all deleted in this PR: - DenseGaussianMixture.scala - StreamingLinearRegression.scala ## delete reasons: #### delete: mllib/DenseGaussianMixture.scala - duplicate of mllib/GaussianMixtureExample #### delete: mllib/StreamingLinearRegression.scala - duplicate of mllib/StreamingLinearRegressionExample When merging and cleaning those code, be sure not disturb the previous example on and off blocks. ## How was this patch tested? Test with `SKIP_API=1 jekyll` manually to make sure that works well. Author: Xin Ren <iamshrek@126.com> Closes #12195 from keypointt/SPARK-14300.
-
- Oct 24, 2016
-
-
Sean Owen authored
[SPARK-17810][SQL] Default spark.sql.warehouse.dir is relative to local FS but can resolve as HDFS path ## What changes were proposed in this pull request? Always resolve spark.sql.warehouse.dir as a local path, and as relative to working dir not home dir ## How was this patch tested? Existing tests. Author: Sean Owen <sowen@cloudera.com> Closes #15382 from srowen/SPARK-17810.
-
- Oct 17, 2016
-
-
Maxime Rihouey authored
## What changes were proposed in this pull request? The python example for tf_idf with the parameter "minDocFreq" is not properly set up because the same variable is used to transform the document for both with and without the "minDocFreq" parameter. The IDF(minDocFreq=2) is stored in the variable "idfIgnore" but then it is the original variable "idf" used to transform the "tf" instead of the "idfIgnore". ## How was this patch tested? Before the results for "tfidf" and "tfidfIgnore" were the same: tfidf: (1048576,[1046921],[3.75828890549]) (1048576,[1046920],[3.75828890549]) (1048576,[1046923],[3.75828890549]) (1048576,[892732],[3.75828890549]) (1048576,[892733],[3.75828890549]) (1048576,[892734],[3.75828890549]) tfidfIgnore: (1048576,[1046921],[3.75828890549]) (1048576,[1046920],[3.75828890549]) (1048576,[1046923],[3.75828890549]) (1048576,[892732],[3.75828890549]) (1048576,[892733],[3.75828890549]) (1048576,[892734],[3.75828890549]) After the fix those are how they should be: tfidf: (1048576,[1046921],[3.75828890549]) (1048576,[1046920],[3.75828890549]) (1048576,[1046923],[3.75828890549]) (1048576,[892732],[3.75828890549]) (1048576,[892733],[3.75828890549]) (1048576,[892734],[3.75828890549]) tfidfIgnore: (1048576,[1046921],[0.0]) (1048576,[1046920],[0.0]) (1048576,[1046923],[0.0]) (1048576,[892732],[0.0]) (1048576,[892733],[0.0]) (1048576,[892734],[0.0]) Author: Maxime Rihouey <maxime.rihouey@gmail.com> Closes #15503 from maximerihouey/patch-1.
-
- Oct 10, 2016
-
-
Wenchen Fan authored
## What changes were proposed in this pull request? Global temporary view is a cross-session temporary view, which means it's shared among all sessions. Its lifetime is the lifetime of the Spark application, i.e. it will be automatically dropped when the application terminates. It's tied to a system preserved database `global_temp`(configurable via SparkConf), and we must use the qualified name to refer a global temp view, e.g. SELECT * FROM global_temp.view1. changes for `SessionCatalog`: 1. add a new field `gloabalTempViews: GlobalTempViewManager`, to access the shared global temp views, and the global temp db name. 2. `createDatabase` will fail if users wanna create `global_temp`, which is system preserved. 3. `setCurrentDatabase` will fail if users wanna set `global_temp`, which is system preserved. 4. add `createGlobalTempView`, which is used in `CreateViewCommand` to create global temp views. 5. add `dropGlobalTempView`, which is used in `CatalogImpl` to drop global temp view. 6. add `alterTempViewDefinition`, which is used in `AlterViewAsCommand` to update the view definition for local/global temp views. 7. `renameTable`/`dropTable`/`isTemporaryTable`/`lookupRelation`/`getTempViewOrPermanentTableMetadata`/`refreshTable` will handle global temp views. changes for SQL commands: 1. `CreateViewCommand`/`AlterViewAsCommand` is updated to support global temp views 2. `ShowTablesCommand` outputs a new column `database`, which is used to distinguish global and local temp views. 3. other commands can also handle global temp views if they call `SessionCatalog` APIs which accepts global temp views, e.g. `DropTableCommand`, `AlterTableRenameCommand`, `ShowColumnsCommand`, etc. changes for other public API 1. add a new method `dropGlobalTempView` in `Catalog` 2. `Catalog.findTable` can find global temp view 3. add a new method `createGlobalTempView` in `Dataset` ## How was this patch tested? new tests in `SQLViewSuite` Author: Wenchen Fan <wenchen@databricks.com> Closes #14897 from cloud-fan/global-temp-view.
-
- Oct 05, 2016
-
-
sethah authored
## What changes were proposed in this pull request? Updates user guide to reflect that LogisticRegression now supports multiclass. Also adds new examples to show multiclass training. ## How was this patch tested? Ran locally using spark-submit, run-example, and copy/paste from user guide into shells. Generated docs and verified correct output. Author: sethah <seth.hendrickson16@gmail.com> Closes #15349 from sethah/SPARK-17239.
-