Skip to content
Snippets Groups Projects
  1. Nov 17, 2015
  2. Nov 16, 2015
  3. Nov 15, 2015
    • gatorsmile's avatar
      [SPARK-9928][SQL] Removal of LogicalLocalTable · b58765ca
      gatorsmile authored
      LogicalLocalTable in ExistingRDD.scala is replaced by localRelation in LocalRelation.scala?
      
      Do you know any reason why we still keep this class?
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #9717 from gatorsmile/LogicalLocalTable.
      b58765ca
    • Sun Rui's avatar
      [SPARK-10500][SPARKR] sparkr.zip cannot be created if /R/lib is unwritable · 835a79d7
      Sun Rui authored
      The basic idea is that:
      The archive of the SparkR package itself, that is sparkr.zip, is created during build process and is contained in the Spark binary distribution. No change to it after the distribution is installed as the directory it resides ($SPARK_HOME/R/lib) may not be writable.
      
      When there is R source code contained in jars or Spark packages specified with "--jars" or "--packages" command line option, a temporary directory is created by calling Utils.createTempDir() where the R packages built from the R source code will be installed. The temporary directory is writable, and won't interfere with each other when there are multiple SparkR sessions, and will be deleted when this SparkR session ends. The R binary packages installed in the temporary directory then are packed into an archive named rpkg.zip.
      
      sparkr.zip and rpkg.zip are distributed to the cluster in YARN modes.
      
      The distribution of rpkg.zip in Standalone modes is not supported in this PR, and will be address in another PR.
      
      Various R files are updated to accept multiple lib paths (one is for SparkR package, the other is for other R packages)  so that these package can be accessed in R.
      
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #9390 from sun-rui/SPARK-10500.
      835a79d7
    • zero323's avatar
      [SPARK-11086][SPARKR] Use dropFactors column-wise instead of nested loop when createDataFrame · d7d9fa0b
      zero323 authored
      Use `dropFactors` column-wise instead of nested loop when `createDataFrame` from a `data.frame`
      
      At this moment SparkR createDataFrame is using nested loop to convert factors to character when called on a local data.frame.  It works but is incredibly slow especially with data.table (~ 2 orders of magnitude compared to PySpark / Pandas version on a DateFrame of size 1M rows x 2 columns).
      
      A simple improvement is to apply `dropFactor `column-wise and then reshape output list.
      
      It should at least partially address [SPARK-8277](https://issues.apache.org/jira/browse/SPARK-8277).
      
      Author: zero323 <matthew.szymkiewicz@gmail.com>
      
      Closes #9099 from zero323/SPARK-11086.
      d7d9fa0b
    • Yu Gao's avatar
      [SPARK-10181][SQL] Do kerberos login for credentials during hive client initialization · 72c1d68b
      Yu Gao authored
      On driver process start up, UserGroupInformation.loginUserFromKeytab is called with the principal and keytab passed in, and therefore static var UserGroupInfomation,loginUser is set to that principal with kerberos credentials saved in its private credential set, and all threads within the driver process are supposed to see and use this login credentials to authenticate with Hive and Hadoop. However, because of IsolatedClientLoader, UserGroupInformation class is not shared for hive metastore clients, and instead it is loaded separately and of course not able to see the prepared kerberos login credentials in the main thread.
      
      The first proposed fix would cause other classloader conflict errors, and is not an appropriate solution. This new change does kerberos login during hive client initialization, which will make credentials ready for the particular hive client instance.
      
       yhuai Please take a look and let me know. If you are not the right person to talk to, could you point me to someone responsible for this?
      
      Author: Yu Gao <ygao@us.ibm.com>
      Author: gaoyu <gaoyu@gaoyu-macbookpro.roam.corp.google.com>
      Author: Yu Gao <crystalgaoyu@gmail.com>
      
      Closes #9272 from yolandagao/master.
      72c1d68b
    • Yin Huai's avatar
      [SPARK-11738] [SQL] Making ArrayType orderable · 3e2e1873
      Yin Huai authored
      https://issues.apache.org/jira/browse/SPARK-11738
      
      Author: Yin Huai <yhuai@databricks.com>
      
      Closes #9718 from yhuai/makingArrayOrderable.
      3e2e1873
    • Xiangrui Meng's avatar
      [SPARK-11672][ML] set active SQLContext in JavaDefaultReadWriteSuite · 64e55511
      Xiangrui Meng authored
      The same as #9694, but for Java test suite. yhuai
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #9719 from mengxr/SPARK-11672.4.
      64e55511
    • Reynold Xin's avatar
      [SPARK-11734][SQL] Rename TungstenProject -> Project, TungstenSort -> Sort · d22fc108
      Reynold Xin authored
      I didn't remove the old Sort operator, since we still use it in randomized tests. I moved it into test module and renamed it ReferenceSort.
      
      Author: Reynold Xin <rxin@databricks.com>
      
      Closes #9700 from rxin/SPARK-11734.
      d22fc108
  4. Nov 14, 2015
Loading