- May 12, 2015
-
-
linweizhong authored
As PR #5580 we have created pyspark.zip on building and set PYTHONPATH to python/lib/pyspark.zip, so to keep consistence update this. Author: linweizhong <linweizhong@huawei.com> Closes #6047 from Sephiroth-Lin/pyspark_pythonpath and squashes the following commits: 8cc3d96 [linweizhong] Set PYTHONPATH to python/lib/pyspark.zip rather than python/pyspark as PR#5580 we have create pyspark.zip on build
-
zsxwing authored
Just improved the Stage table when a stage is missing. Before:  After:  Author: zsxwing <zsxwing@gmail.com> Closes #6061 from zsxwing/SPARK-7534 and squashes the following commits: 09fe862 [zsxwing] Leave it blank rather than '-' 6299197 [zsxwing] Fix the Stage table when a stage is missing
-
vidmantas zemleris authored
add docs for https://issues.apache.org/jira/browse/SPARK-6994 Author: vidmantas zemleris <vidmantas@vinted.com> Closes #6030 from vidma/docs/row-with-named-fields and squashes the following commits: 241b401 [vidmantas zemleris] [SPARK-6994][SQL] Update docs for fetching Row fields by name
-
Reynold Xin authored
Author: Reynold Xin <rxin@databricks.com> Closes #6071 from rxin/parserdialect and squashes the following commits: ca2eb31 [Reynold Xin] Rename Dialect -> ParserDialect.
-
- May 11, 2015
-
-
Joshi authored
Author: Joshi <rekhajoshm@gmail.com> Author: Rekha Joshi <rekhajoshm@gmail.com> Closes #5989 from rekhajoshm/fix/SPARK-7435 and squashes the following commits: cfc9e02 [Joshi] Spark-7435[R]: updated patch for review comments 62becc1 [Joshi] SPARK-7435: Update to DataFrame e3677c9 [Rekha Joshi] Merge pull request #1 from apache/master
-
Reynold Xin authored
Author: Reynold Xin <rxin@databricks.com> Closes #6068 from rxin/drop-column and squashes the following commits: 9d7d5ec [Reynold Xin] [SPARK-7509][SQL] DataFrame.drop in Python for dropping columns.
-
Zhongshuai Pei authored
SQL ``` select key from src where 3 in (4, 5); ``` Before ``` == Optimized Logical Plan == Project [key#12] Filter 3 INSET (5,4) MetastoreRelation default, src, None ``` After ``` == Optimized Logical Plan == LocalRelation [key#228], [] ``` Author: Zhongshuai Pei <799203320@qq.com> Author: DoingDone9 <799203320@qq.com> Closes #5972 from DoingDone9/InToFalse and squashes the following commits: 4c722a2 [Zhongshuai Pei] Update predicates.scala abe2bbb [Zhongshuai Pei] Update Optimizer.scala fa461a5 [Zhongshuai Pei] Update Optimizer.scala e34c28a [Zhongshuai Pei] Update predicates.scala 24739bd [Zhongshuai Pei] Update ConstantFoldingSuite.scala f4dbf50 [Zhongshuai Pei] Update ConstantFoldingSuite.scala 35ceb7a [Zhongshuai Pei] Update Optimizer.scala 36c194e [Zhongshuai Pei] Update Optimizer.scala 2e8f6ca [Zhongshuai Pei] Update Optimizer.scala 14952e2 [Zhongshuai Pei] Merge pull request #13 from apache/master f03fe7f [Zhongshuai Pei] Merge pull request #12 from apache/master f12fa50 [Zhongshuai Pei] Merge pull request #10 from apache/master f61210c [Zhongshuai Pei] Merge pull request #9 from apache/master 34b1a9a [Zhongshuai Pei] Merge pull request #8 from apache/master 802261c [DoingDone9] Merge pull request #7 from apache/master d00303b [DoingDone9] Merge pull request #6 from apache/master 98b134f [DoingDone9] Merge pull request #5 from apache/master 161cae3 [DoingDone9] Merge pull request #4 from apache/master c87e8b6 [DoingDone9] Merge pull request #3 from apache/master cb1852d [DoingDone9] Merge pull request #2 from apache/master c3f046f [DoingDone9] Merge pull request #1 from apache/master
-
Cheng Hao authored
This is a follow up of #5876 and should be merged after #5876. Let's wait for unit testing result from Jenkins. Author: Cheng Hao <hao.cheng@intel.com> Closes #5963 from chenghao-intel/useIsolatedClient and squashes the following commits: f87ace6 [Cheng Hao] remove the TODO and add `resolved condition` for HiveTable a8260e8 [Cheng Hao] Update code as feedback f4e243f [Cheng Hao] remove the serde setting for SequenceFile d166afa [Cheng Hao] style issue d25a4aa [Cheng Hao] Add SerDe support for CTAS
-
Reynold Xin authored
This should also close https://github.com/apache/spark/pull/5870 Author: Reynold Xin <rxin@databricks.com> Closes #6066 from rxin/dropDups and squashes the following commits: 130692f [Reynold Xin] [SPARK-7324][SQL] DataFrame.dropDuplicates
-
Tathagata Das authored
[SPARK-7530] [STREAMING] Added StreamingContext.getState() to expose the current state of the context Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #6058 from tdas/SPARK-7530 and squashes the following commits: 80ee0e6 [Tathagata Das] STARTED --> ACTIVE 3da6547 [Tathagata Das] Added synchronized dd88444 [Tathagata Das] Added more docs e1a8505 [Tathagata Das] Fixed comment length 89f9980 [Tathagata Das] Change to Java enum and added Java test 7c57351 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-7530 dd4e702 [Tathagata Das] Addressed comments. 3d56106 [Tathagata Das] Added Mima excludes 2b86ba1 [Tathagata Das] Added scala docs. 1722433 [Tathagata Das] Fixed style 976b094 [Tathagata Das] Added license 0585130 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-7530 e0f0a05 [Tathagata Das] Added getState and exposed StreamingContextState
-
Xusen Yin authored
JIRA issue [here](https://issues.apache.org/jira/browse/SPARK-5893). One thing to make clear, the `buckets` parameter, which is an array of `Double`, performs as split points. Say, ```scala buckets = Array(-0.5, 0.0, 0.5) ``` splits the real number into 4 ranges, (-inf, -0.5], (-0.5, 0.0], (0.0, 0.5], (0.5, +inf), which is encoded as 0, 1, 2, 3. Author: Xusen Yin <yinxusen@gmail.com> Author: Joseph K. Bradley <joseph@databricks.com> Closes #5980 from yinxusen/SPARK-5893 and squashes the following commits: dc8c843 [Xusen Yin] Merge pull request #4 from jkbradley/yinxusen-SPARK-5893 1ca973a [Joseph K. Bradley] one more bucketizer test 34f124a [Joseph K. Bradley] Removed lowerInclusive, upperInclusive params from Bucketizer, and used splits instead. eacfcfa [Xusen Yin] change ML attribute from splits into buckets c3cc770 [Xusen Yin] add more unit test for binary search 3a16cc2 [Xusen Yin] refine comments and names ac77859 [Xusen Yin] fix style error fb30d79 [Xusen Yin] fix and test binary search 2466322 [Xusen Yin] refactor Bucketizer 11fb00a [Xusen Yin] change it into an Estimator 998bc87 [Xusen Yin] check buckets 4024cf1 [Xusen Yin] add test suite 5fe190e [Xusen Yin] add bucketizer
-
Reynold Xin authored
So users that are interested in this can track it easily. Author: Reynold Xin <rxin@databricks.com> Closes #6067 from rxin/SPARK-7550 and squashes the following commits: ee0e34c [Reynold Xin] Updated DataFrame.saveAsTable Hive warning to include SPARK-7550 ticket.
-
Reynold Xin authored
Author: Reynold Xin <rxin@databricks.com> Closes #6062 from rxin/agg-retain-doc and squashes the following commits: 43e511e [Reynold Xin] [SPARK-7462][SQL] Update documentation for retaining grouping columns in DataFrames.
-
madhukar authored
Author: madhukar <phatak.dev@gmail.com> Closes #5654 from phatak-dev/master and squashes the following commits: 386f407 [madhukar] #5654 updated for all the methods 2c997c5 [madhukar] Merge branch 'master' of https://github.com/apache/spark 00bc819 [madhukar] Merge branch 'master' of https://github.com/apache/spark 2a802c6 [madhukar] #5654 updated the doc according to comments 866e8df [madhukar] [SPARK-7084] improve saveAsTable documentation
-
Reynold Xin authored
As a follow-up to https://github.com/apache/spark/pull/5944 Author: Reynold Xin <rxin@databricks.com> Closes #6064 from rxin/jointype-better-error and squashes the following commits: 7629bf7 [Reynold Xin] [SQL] Show better error messages for incorrect join types in DataFrames.
-
Sean Owen authored
Author: Sean Owen <sowen@cloudera.com> Closes #6063 from srowen/FixRunningTestsLink and squashes the following commits: db62018 [Sean Owen] Fix the link to test building info on the wiki
-
LCY Vincent authored
should sync up with here? https://github.com/apache/spark/blob/119f45d61d7b48d376cca05e1b4f0c7fcf65bfa8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/joinTypes.scala#L26 Author: LCY Vincent <lauchunyin@gmail.com> Closes #5944 from vincentlaucy/master and squashes the following commits: fc0e454 [LCY Vincent] Update DataFrame.scala
-
jerryshao authored
Currently there's no chance to close the file correctly after the iteration is finished, change to `CompletionIterator` to avoid resource leakage. Author: jerryshao <saisai.shao@intel.com> Closes #6050 from jerryshao/close-file-correctly and squashes the following commits: 52dfaf5 [jerryshao] Close files correctly when iterator is finished
-
gchen authored
JIRA: https://issues.apache.org/jira/browse/SPARK-7516 In sql-programming-guide, deprecated python data frame api inferSchema() should be replaced by createDataFrame(): schemaPeople = sqlContext.inferSchema(people) -> schemaPeople = sqlContext.createDataFrame(people) Author: gchen <chenguancheng@gmail.com> Closes #6041 from gchen/python-docs and squashes the following commits: c27eb7c [gchen] replace inferSchema() with createDataFrame()
-
Kousuke Saruta authored
Now PySpark on YARN with cluster mode is supported so let's update doc. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #6040 from sarutak/update-doc-for-pyspark-on-yarn and squashes the following commits: ad9f88c [Kousuke Saruta] Brushed up sentences 469fd2e [Kousuke Saruta] Merge branch 'master' of https://github.com/apache/spark into update-doc-for-pyspark-on-yarn fcfdb92 [Kousuke Saruta] Updated doc for PySpark on YARN with cluster mode
-
Steve Loughran authored
Patch for SPARK-7508 This logs warn then generates a response which include the message body and stack trace as text/plain, no-cache. The exit code is 500. In practise (in some tests in SPARK-1537 to be precise), jetty is getting in between this servlet and the web response the user sees —the body of the response is lost for any error response (500, even 404 and bad request). The standard Jetty handlers must be getting in the way. This patch doesn't address that, it ensures that 1. if the jetty handlers were put to one side the users would see the errors 2. at least the exceptions appear in the server-side logs. This is better to users saying "I saw a 500 error" and you not having anything in the logs to see what went wrong. Author: Steve Loughran <stevel@hortonworks.com> Closes #6033 from steveloughran/stevel/feature/SPARK-7508-JettyUtils and squashes the following commits: 584836f [Steve Loughran] SPARK-7508 drop trailing semicolon ad6f185 [Steve Loughran] SPARK-7508: jetty handles exception reporting itself; spark just sets this up and logs exceptions before being relayed 258d9f9 [Steve Loughran] SPARK-7508 fix typo manually-edited before patch pushed 69c8263 [Steve Loughran] SPARK-7508 JettyUtils-generated servlets to log & report all errors
-
Sandy Ryza authored
This is difficult to write a test for because it relies on the latest version of YARN, but I verified manually that the patch does pass along the label expression on this version and containers are successfully launched. Author: Sandy Ryza <sandy@cloudera.com> Closes #5242 from sryza/sandy-spark-6470 and squashes the following commits: 6af87b9 [Sandy Ryza] Change info to warning 6e22d99 [Sandy Ryza] [YARN] SPARK-6470. Add support for YARN node labels.
-
Reynold Xin authored
Updated Java, Scala, Python, and R. Author: Reynold Xin <rxin@databricks.com> Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #5996 from rxin/groupby-retain and squashes the following commits: aac7119 [Reynold Xin] Merge branch 'groupby-retain' of github.com:rxin/spark into groupby-retain f6858f6 [Reynold Xin] Merge branch 'master' into groupby-retain 5f923c0 [Reynold Xin] Merge pull request #15 from shivaram/sparkr-groupby-retrain c1de670 [Shivaram Venkataraman] Revert workaround in SparkR to retain grouped cols Based on reverting code added in commit https://github.com/amplab-extras/spark/commit/9a6be746efc9fafad88122fa2267862ef87aa0e1 b8b87e1 [Reynold Xin] Fixed DataFrameJoinSuite. d910141 [Reynold Xin] Updated rest of the files 1e6e666 [Reynold Xin] [SPARK-7462] By default retain group by columns in aggregate
-
Tathagata Das authored
[SPARK-7361] [STREAMING] Throw unambiguous exception when attempting to start multiple StreamingContexts in the same JVM Currently attempt to start a streamingContext while another one is started throws a confusing exception that the action name JobScheduler is already registered. Instead its best to throw a proper exception as it is not supported. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #5907 from tdas/SPARK-7361 and squashes the following commits: fb81c4a [Tathagata Das] Fix typo a9cd5bb [Tathagata Das] Added startSite to StreamingContext 5fdfc0d [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into SPARK-7361 5870e2b [Tathagata Das] Added check for multiple streaming contexts
-
Bryan Cutler authored
As is, to specify this option on command line, you have to escape the angle brackets. Author: Bryan Cutler <bjcutler@us.ibm.com> Closes #6049 from BryanCutler/dataFormat-option-7522 and squashes the following commits: b34afb4 [Bryan Cutler] [SPARK-7522] Removed angle brackets from dataFormat option
-
Yanbo Liang authored
Author: Yanbo Liang <ybliang8@gmail.com> Closes #6044 from yanboliang/spark-6092 and squashes the following commits: 726a9b1 [Yanbo Liang] add newRankingMetrics 33f649c [Yanbo Liang] Add RankingMetrics in PySpark/MLlib
-
Wesley Miao authored
tdas https://issues.apache.org/jira/browse/SPARK-7326 The problem most likely resides in DStream.slice() implementation, as shown below. def slice(fromTime: Time, toTime: Time): Seq[RDD[T]] = { if (!isInitialized) { throw new SparkException(this + " has not been initialized") } if (!(fromTime - zeroTime).isMultipleOf(slideDuration)) { logWarning("fromTime (" + fromTime + ") is not a multiple of slideDuration (" + slideDuration + ")") } if (!(toTime - zeroTime).isMultipleOf(slideDuration)) { logWarning("toTime (" + fromTime + ") is not a multiple of slideDuration (" + slideDuration + ")") } val alignedToTime = toTime.floor(slideDuration, zeroTime) val alignedFromTime = fromTime.floor(slideDuration, zeroTime) logInfo("Slicing from " + fromTime + " to " + toTime + " (aligned to " + alignedFromTime + " and " + alignedToTime + ")") alignedFromTime.to(alignedToTime, slideDuration).flatMap(time => { if (time >= zeroTime) getOrCompute(time) else None }) } Here after performing floor() on both fromTime and toTime, the result (alignedFromTime - zeroTime) and (alignedToTime - zeroTime) may no longer be multiple of the slidingDuration, thus making isTimeValid() check failed for all the remaining computation. The fix is to add a new floor() function in Time.scala to respect the zeroTime while performing the floor : def floor(that: Duration, zeroTime: Time): Time = { val t = that.milliseconds new Time(((this.millis - zeroTime.milliseconds) / t) * t + zeroTime.milliseconds) } And then change the DStream.slice to call this new floor function by passing in its zeroTime. val alignedToTime = toTime.floor(slideDuration, zeroTime) val alignedFromTime = fromTime.floor(slideDuration, zeroTime) This way the alignedToTime and alignedFromTime are *really* aligned in respect to zeroTime whose value is not really a 0. Author: Wesley Miao <wesley.miao@gmail.com> Author: Wesley <wesley.miao@autodesk.com> Closes #5871 from wesleymiao/spark-7326 and squashes the following commits: 82a4d8c [Wesley Miao] [SPARK-7326] [STREAMING] Performing window() on a WindowedDStream dosen't work all the time 48b4dc0 [Wesley] [SPARK-7326] [STREAMING] Performing window() on a WindowedDStream doesn't work all the time 6ade399 [Wesley] [SPARK-7326] [STREAMING] Performing window() on a WindowedDStream doesn't work all the time 2611745 [Wesley Miao] [SPARK-7326] [STREAMING] Performing window() on a WindowedDStream doesn't work all the time
-
tianyi authored
Bugs description: 1. There are extra commas on the top of session list. 2. The format of time in "Start at:" part is not the same as others. 3. The total number of online sessions is wrong. Author: tianyi <tianyi.asiainfo@gmail.com> Closes #6048 from tianyi/SPARK-7519 and squashes the following commits: ed366b7 [tianyi] fix bug
-
- May 10, 2015
-
-
Shivaram Venkataraman authored
Since the RDD object might be a Pipelined RDD we should use `getJRDD` to get the right handle to the Java object. Fixes the bug reported at http://stackoverflow.com/questions/30057702/sparkr-filterrdd-and-flatmap-not-working cc concretevitamin Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #6035 from shivaram/sparkr-show-bug and squashes the following commits: d70145c [Shivaram Venkataraman] Fix RDD's show method to use getJRDD Fixes the bug reported at http://stackoverflow.com/questions/30057702/sparkr-filterrdd-and-flatmap-not-working
-
Glenn Weidner authored
Modified 2 files: python/pyspark/ml/param/_shared_params_code_gen.py python/pyspark/ml/param/shared.py Generated shared.py on Linux using Python 2.6.6 on Redhat Enterprise Linux Server 6.6. python _shared_params_code_gen.py > shared.py Only changed maxIter, regParam, rawPredictionCol based on strings from SharedParamsCodeGen.scala. Note warning was displayed when committing shared.py: warning: LF will be replaced by CRLF in python/pyspark/ml/param/shared.py. Author: Glenn Weidner <gweidner@us.ibm.com> Closes #6023 from gweidner/br-7427 and squashes the following commits: db72e32 [Glenn Weidner] [SPARK-7427] [PySpark] Make sharedParams match in Scala, Python 825e4a9 [Glenn Weidner] [SPARK-7427] [PySpark] Make sharedParams match in Scala, Python e6a865e [Glenn Weidner] [SPARK-7427] [PySpark] Make sharedParams match in Scala, Python 1eee702 [Glenn Weidner] Merge remote-tracking branch 'upstream/master' 1ac10e5 [Glenn Weidner] Merge remote-tracking branch 'upstream/master' cafd104 [Glenn Weidner] Merge remote-tracking branch 'upstream/master' 9bea1eb [Glenn Weidner] Merge remote-tracking branch 'upstream/master' 4a35c20 [Glenn Weidner] Merge remote-tracking branch 'upstream/master' 9790cbe [Glenn Weidner] Merge remote-tracking branch 'upstream/master' d9c30f4 [Glenn Weidner] [SPARK-7275] [SQL] [WIP] Make LogicalRelation public
-
Kirill A. Korinskiy authored
I implement a simple PCA wrapper for easy transform of vectors by PCA for example LabeledPoint or another complicated structure. Example of usage: ``` import org.apache.spark.mllib.regression.LinearRegressionWithSGD import org.apache.spark.mllib.regression.LabeledPoint import org.apache.spark.mllib.linalg.Vectors import org.apache.spark.mllib.feature.PCA val data = sc.textFile("data/mllib/ridge-data/lpsa.data").map { line => val parts = line.split(',') LabeledPoint(parts(0).toDouble, Vectors.dense(parts(1).split(' ').map(_.toDouble))) }.cache() val splits = data.randomSplit(Array(0.6, 0.4), seed = 11L) val training = splits(0).cache() val test = splits(1) val pca = PCA.create(training.first().features.size/2, data.map(_.features)) val training_pca = training.map(p => p.copy(features = pca.transform(p.features))) val test_pca = test.map(p => p.copy(features = pca.transform(p.features))) val numIterations = 100 val model = LinearRegressionWithSGD.train(training, numIterations) val model_pca = LinearRegressionWithSGD.train(training_pca, numIterations) val valuesAndPreds = test.map { point => val score = model.predict(point.features) (score, point.label) } val valuesAndPreds_pca = test_pca.map { point => val score = model_pca.predict(point.features) (score, point.label) } val MSE = valuesAndPreds.map{case(v, p) => math.pow((v - p), 2)}.mean() val MSE_pca = valuesAndPreds_pca.map{case(v, p) => math.pow((v - p), 2)}.mean() println("Mean Squared Error = " + MSE) println("PCA Mean Squared Error = " + MSE_pca) ``` Author: Kirill A. Korinskiy <catap@catap.ru> Author: Joseph K. Bradley <joseph@databricks.com> Closes #4304 from catap/pca and squashes the following commits: 501bcd9 [Joseph K. Bradley] Small updates: removed k from Java-friendly PCA fit(). In PCASuite, converted results to set for comparison. Added an error message for bad k in PCA. 9dcc02b [Kirill A. Korinskiy] [SPARK-5521] fix scala style 1892a06 [Kirill A. Korinskiy] [SPARK-5521] PCA wrapper for easy transform vectors
-
Joseph K. Bradley authored
Fixes bug with PySpark cvModel not having UID Also made small PySpark fixes: Evaluator should inherit from Params. MockModel should inherit from Model. CC: mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #5968 from jkbradley/pyspark-cv-uid and squashes the following commits: 57f13cd [Joseph K. Bradley] Made CrossValidatorModel call parent init in PySpark
-
Cheng Lian authored
<!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/6038) <!-- Reviewable:end --> Author: Cheng Lian <lian@databricks.com> Closes #6038 from liancheng/fix-typo and squashes the following commits: 572c2a4 [Cheng Lian] Fixes variable name typo
-
Oleg Sidorkin authored
Issue appears when one tries to create DataFrame using sqlContext.load("jdbc"...) statement when "dbtable" contains query with renamed columns. If original column is used in SQL query once the resulting DataFrame will contain non-renamed column. If original column is used in SQL query several times with different aliases, sqlContext.load will fail. Original implementation of JDBCRDD.resolveTable uses getColumnName to detect column names in RDD schema. Suggested implementation uses getColumnLabel to handle column renames in SQL statement which is aware of SQL "AS" statement. Readings: http://stackoverflow.com/questions/4271152/getcolumnlabel-vs-getcolumnname http://stackoverflow.com/questions/12259829/jdbc-getcolumnname-getcolumnlabel-db2 Official documentation unfortunately a bit misleading in definition of "suggested title" purpose however clearly defines behavior of AS keyword in SQL statement. http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSetMetaData.html getColumnLabel - Gets the designated column's suggested title for use in printouts and displays. The suggested title is usually specified by the SQL AS clause. If a SQL AS is not specified, the value returned from getColumnLabel will be the same as the value returned by the getColumnName method. Author: Oleg Sidorkin <oleg.sidorkin@gmail.com> Closes #6032 from osidorkin/master and squashes the following commits: 10fc44b [Oleg Sidorkin] [SPARK-7345][SQL] Regression test for JDBCSuite (resolved scala style test error) 2aaf6f7 [Oleg Sidorkin] [SPARK-7345][SQL] Regression test for JDBCSuite (renamed fields in JDBC query) b7d5b22 [Oleg Sidorkin] [SPARK-7345][SQL] Regression test for JDBCSuite 09559a0 [Oleg Sidorkin] [SPARK-7345][SQL] Spark cannot detect renamed columns using JDBC connector
-
Yanbo Liang authored
https://issues.apache.org/jira/browse/SPARK-6091 Author: Yanbo Liang <ybliang8@gmail.com> Closes #6011 from yanboliang/spark-6091 and squashes the following commits: bb3e4ba [Yanbo Liang] trigger jenkins 53c045d [Yanbo Liang] keep compatibility for python 2.6 972d5ac [Yanbo Liang] Add MulticlassMetrics in PySpark/MLlib
-
- May 09, 2015
-
-
Yuhao Yang authored
jira: https://issues.apache.org/jira/browse/SPARK-7475 Add a new argument to specify the algorithm applied to LDA, to exhibit the basic usage of LDAOptimizer. cc jkbradley Author: Yuhao Yang <hhbyyh@gmail.com> Closes #6000 from hhbyyh/ldaExample and squashes the following commits: 0a7e2bc [Yuhao Yang] fix according to comments 5810b0f [Yuhao Yang] adjust ldaExample for online LDA
-
tedyu authored
Author: tedyu <yuzhihong@gmail.com> Closes #6031 from tedyu/master and squashes the following commits: 5c2580c [tedyu] Reference fasterxml.jackson.version in sql/core/pom.xml ff2a44f [tedyu] Merge branch 'master' of github.com:apache/spark 28c8394 [tedyu] Upgrade version of jackson-databind in sql/core/pom.xml
-
tedyu authored
Currently version of jackson-databind in sql/core/pom.xml is 2.3.0 This is older than the version specified in root pom.xml This PR upgrades the version in sql/core/pom.xml so that they're consistent. Author: tedyu <yuzhihong@gmail.com> Closes #6028 from tedyu/master and squashes the following commits: 28c8394 [tedyu] Upgrade version of jackson-databind in sql/core/pom.xml
-
dobashim authored
A little fix about wrong url of the API document. (org.apache.spark.streaming.scheduler.StreamingListener) Author: dobashim <dobashim@oss.nttdata.co.jp> Closes #6024 from dobashim/master and squashes the following commits: ac9a955 [dobashim] [STREAMING][DOCS] Fix wrong url about API docs of StreamingListener
-
Kousuke Saruta authored
When we use Spark on YARN and have AllJobPage via ResourceManager's proxy, the link URL in objects which represent each job on timeline view is wrong. In timeline-view.js, the link is generated as follows. ``` window.location.href = "job/?id=" + getJobId(this); ``` This assumes the URL displayed on the web browser ends with "jobs/" but when we access AllJobPage via the proxy, the url displayed does not end with "jobs/" The proxy doesn't return status code 301 or 302 so the url displayed still indicates the base url, not "/jobs" even though displaying AllJobPages.  Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #5947 from sarutak/fix-link-in-timeline and squashes the following commits: aaf40e1 [Kousuke Saruta] Added Copyright for vis.js 01bee7b [Kousuke Saruta] Fixed timeline-view.js in order to get correct href
-