Skip to content

Latest commit

 

History

History
29 lines (25 loc) · 2.63 KB

FORK.md

File metadata and controls

29 lines (25 loc) · 2.63 KB

Difference with upstream

  • SPARK-15777 (Partial fix) - Catalog federation
    • make ExternalCatalog configurable beyond in memory and hive
    • FileIndex for catalog tables is provided by external catalog instead of using default impl
  • SPARK-33089 Enhance ExecutorPlugin API to include callbacks on task start and end events
    • Merged upstream, remove when we migrate to 3.1
  • SPARK-18079 - CollectLimitExec.executeToIterator should perform per-partition limits
  • SPARK-20952 - ParquetFileFormat should forward TaskContext to its forkjoinpool
  • SPARK-26626 - Limited the maximum size of repeatedly substituted aliases
  • SPARK-25200 - Allow setting HADOOP_CONF_DIR as a spark config
  • SafeLogging implemented for the following files:
    • core: Broadcast, CoarseGrainedExecutorBackend, CoarseGrainedSchedulerBackend, Executor, MapOutputTracker (partial), MemoryStore, SparkContext, TorrentBroadcast
    • kubernetes: ExecutorPodsAllocator, ExecutorPodsLifecycleManager, ExecutorPodsPollingSnapshotSource, ExecutorPodsSnapshot, ExecutorPodsWatchSnapshotSource, KubernetesClusterSchedulerBackend
    • yarn: YarnClusterSchedulerBackend, YarnSchedulerBackend
  • SPARK-20001 (SPARK-13587) - Support PythonRunner executing inside a Conda env (and R)
  • SPARK-21195 - Automatically register new metrics from sources and wire default registry
  • spark.sql.parquet.outputTimestampType defaults to INT64 (TIMESTAMP_MICROS)

Added