1 I recently learned about the np.select operation and decided to create a class to experiment with it and also learn a bit more on OOP. I know how to import and use the datetime library but in this construction it gives me this error: 52. MjFiMjZlZmY3M2ZhZGI2MzE5YWFjYzcwOTc2MDc1YjdkY2NhYmVlNGUzMWMx NDgyMjVmZjYxOWFmZTEwZDAzMzMwOGU5N2Q2M2FhMWEzMzlmM2YwMDlhMmI0 Like RDD, DataFrame also has operations like Transformations and Actions. Nov 3, 2016 at 4:03. NzVhMzIzZmY3MDAwZWEyZWYzNDU4ZWI5NmJmODhjMzFiNTQ4ODNiYTEzY2Rk 1 I have this data as output when i perform timeStamp_df.head () in pyspark: Row (timeStamp='ISODate (2020-06-03T11:30:16.900+0000)', timeStamp='ISODate (2020-06-03T11:30:16.900+0000)', timeStamp='ISODate (2020-06-03T11:30:16.900+0000)', timeStamp='ISODate (2020-05-03T11:30:16.900+0000)', timeStamp='ISODate (2020-04-03T11:30:16.900+0000)') Solution: NameError: Name 'Spark' is not Defined in PySpark I have a data frame that looks as below (there are in total about 20 different codes, each represented by a letter), now I want to update the data frame by adding a description to each of the codes. YjU3NWQ1MWM0NjY4ZjczYjkzY2U3YWMxNTcyZDQ5ODFmZmE3NzQzZmIzZGM2 MDMwYzZkYTFkYTcwZWRkMzFjYjJiODdkNjE3MmVmOTQxYjMyMzg4MjlmN2U5 a workaround is to import functions and call the col function from there. Here the definition of my class followed by an example (the class uses the translate function defined at the beginning): 106. NameError: name 'reduce' is not defined in Python. Y2RjYzY0ZWNiMGEyM2Q2N2E4NDlkNTg4NThjOTNmYTFmN2EyZmEwODA5YzU0 NTdhYWVjNzVkM2VjN2FjOTllMDcxNTA4ZmIxMDBjZTdlMTBmYTVhMTlkZGE1 Alternative to specifying axis (mapper, axis=0 is equivalent to index=mapper). Generated by Wordfence at Tue, 25 Jul 2023 17:18:32 GMT.Your computer's time: document.write(new Date().toUTCString());. Column alias after groupBy in pyspark. In case of a MultiIndex, only rename labels in the specified level. answered May 7, 2020 at 14:19. MDI4ZTIyZTM3ZGM2NGFhNjI0ZDYwYmE3MzkxNjFhNjQzNzBkMzFjYTNjMTcy 1 I'm using Pydantic together with a foreach writer in Pyspark with structured streaming to validate incoming events. MDZhM2I3ZTE3MTk1YzVkNTI0NTAwZjEwNzBiZGFmZGIzNGIxMmRlNDRjMmJh It is conceptually equivalent to a table in a relational database or a data frame in R/Python, butwith richer optimizations under the hood. In case of a MultiIndex, only rename labels in the specified level. ZWFkNTA4NTdhMGYzODQxNzgxZGZhNjhkZDRkOGZkNDA3MjU5ODU2YjgyNWJj Your access to this site was blocked by Wordfence, a security provider, who protects sites from malicious activity. ZWJmNzRmN2NjNDE1ZTE0NTZmZTAxZTQ4OGRiMWUyODhmY2VjNWQ5NmY1ZDE0 How to add suffix and prefix to all columns in python/pyspark dataframe. In other words, pandas DataFrames run operations on a single node whereas PySpark runs on multiple machines. NTlkODc0NWZhYzk3ZTU5YTlmM2Q5YSIsInNpZ25hdHVyZSI6IjczMmIzMjJj ZjQzNTVlNjk0YTc2NjE2ZGUwZTU5YjJjMmY1ZjljOGZhNmUyNzU0ODkwZTk0 1. try defining spark var. Simplest way to create an DataFrame is from a Python list of data. MzA4MDJjZDI2OTQ1YjNjMjQwYzZhYjM4ZGNlNGFhODQ0OGFmNWVjODY4YjBi Dict can contain Series, arrays, constants, or list-like objects If data is a dict, argument order is maintained for Python 3.6 and later. That would fix it but next you might get NameError: name 'IntegerType' is not defined or NameError: name 'StringType' is not defined .. To avoid all of that just do: from pyspark.sql.types import *. One of the fields of the incoming events is timestamp. no there's no method when of dataframes. Giving your imported module an alias ( pd) does not automatically import the modules namespace. If you believe Wordfence should be allowing you access to this site, please let them know using the steps below so they can investigate why this is happening. 1 Answer Sorted by: 2 It seems that you are repeating very similar questions. the problem is indeed that when has not been imported. Naveen (NNK) PySpark April 24, 2023 Spread the love Loaded 0% - Auto (360p LQ) web page the video is based on PySpark Tutorial For Beginners (Spark with Python) PySpark DataFrame DataFrame definition is very well explained by Databricks hence I do not want to define it again and confuse you. Less code to paw through equals happy reviewers. ODljN2U0ZTNjZGE2Zjg2MTVlMmNlZDFlZTc0ODg0MzNmOWJiYTAwMjI3NTg1 Lastly, we need to apply the defined schema to the RDD, enabling PySpark to interpret the data and generate a data frame with the desired structure. Here is a potential solution: Read the file using the textFile () method to load it as an RDD (Resilient Distributed Dataset). If ignore, Note that if data is a pandas DataFrame, a Spark DataFrame, and a pandas-on-Spark Series, other arguments should not be used. If raise, raise a KeyError when a dict-like mapper, index, or columns -----END REPORT-----. DataFrame definition is very well explained by Databricks hence I do not want to define it again and confuse you. Yzk0ZGY0Y2M5Nzk4YWUxYjUzNGJkM2FmZGVmODJlOGIwOGEyYWI3MzllZjg5 ZWNjNjMzYjFjMDA0OGVhMzJmMDk1YTlmNjA0MThlZDJjOWU1ODM3ZmQxMzVj This will allow you to process each line . PySpark: NameError: name 'col' is not defined. Improve this answer. MDNkMzY3NmJmMGU1MGRjOTY2Njc4MzM3NmFkMzlhZjRkNmFlNDk1YWFhNGM5 MTUwNjcyYTc0ZmI1N2FkYjcyOGNmOGFjOGIxOWRhOTIwYjlkMGVlNWYyNWUz MzdmYjVkYWU2NmZmOTIzMDA5YmE3ZWNhNjIyYmEzN2JiNjFjYjBiMmE2ODdk - Christian Dean. 7. . - Spark By {Examples} What is PySpark DataFrame? Share Improve this answer Follow edited Dec 27, 2022 at 4:34 Function / dict values must be unique (1-to-1). ZjZhZDc2M2VlOWRlMWU2NjQzZTQyNjEzMjg0NzhkMTBlYmQ1OWEwODg5Nzlj Use DataFrame Column Alias method. Add a comment. Now let use check these methods with an examples. Save my name, email, and website in this browser for the next time I comment. ODc0MjRjZTk2YzJlYzhmZWU5NDljZjdjODYyMDcyY2M2M2JlZTdjOWUwZWIx DataFrame is a distributed collection of data organized into named columns. MmIyNzRjN2ZlMGQ3NGFmNzYwOWMzMjk0OTA5MmU5MTUyZjU5MGM2MzNlNTdk you're thinking of where. -----BEGIN REPORT----- Here is my try. I am trying to find the length of a dataframe column, I am running the following code: from pyspark.sql.functions import * def check_field_length (dataframe: object, name: str, required_length: int): dataframe.where (length (col (name)) >= required_length).show () You need to do df = pd.DataFrame (d). I have a dataframe with a single column but multiple rows, I'm trying to iterate the rows and run a sql line of code on each row and add a column with the result. ZmJjY2NmODlmNGM4NTA4OWZiYTJmYWZiNDgwMzliNDk2OTZkYmVmYzliYWIy MTRhYzI1M2RlMDcxNjIwODdlYmVlYTUxOGYyM2Y4YWJlOWIzOWRiNzdmYjU3 YTgzNDMyYzBhNTllNWU4N2FiMjdiZDljNzg0MTc4NTA2NDYxYzhlNmVhZjgz NameError: Name 'Spark' is not Defined Naveen (NNK) PySpark April 25, 2023 Spread the love Problem: When I am using spark.createDataFrame () I am getting NameError: Name 'Spark' is not Defined, if I use the same in Spark or PySpark shell it works without issue. YTc0YjdhYzFhYmIxYTQ3YmRlYzM2MDQxYTg2ZTNmOGZkMmJmMmNmNTQ2ZGZj OTJmNjFhMDk3OTBiMTJlNzY4YjQyODFkY2RiOTU0OGU3MjAwYWZiNWRlNTEy M2UxODUyMjgwNjZhYTEyYTkzYmZkNDgzZjY4NmI0YzMzOTg5MzFkODk5ZTky Njc3YjRkOTMxYmRlOWJkZDYzNmVjYjk0MWFlMDk4M2NjM2ZiMDdkZGY4Zjcw DataFrame has a rich set of API which supports reading and writing several file formats. ODQyMjRjODkwMDI4ZjcyMmQyNTNlMDllMGQxN2E2MWQ1NzEwOTBmNmEwMDU4 Follow. How can I achieve this? When clause in pyspark gives an error "name 'when' is not defined" Ask Question Asked 3 years, 4 months ago Modified 10 months ago Viewed 11k times 0 With the below code I am getting an error message, name 'when' is not defined. I recommend the former. NzM4NDJhNTdjOWY5NjIwNTdlZDJkZWYwYTc1NmVlNTBmODQ0NjRlYTVmMzk4 8. And consider trimming down the example. I tried: df.select (to_date (df.STRING_COLUMN).alias ('new_date')).show () OTgzOWEzYThjNjI3NTdhMjg1MDEyNmNlNjNlMDM5OTU2ZGY4NTAxN2FmNTdk level int or level name, default None. Pydantic is able to handle datetime values according to their docs. Labels not contained in a dict / Series I got the idea by looking into the pyspark code as I found read csv was working in the interactive shell. 46. ZTYxMTRjZGFjZTJiN2M5OTI1NDJmOGM3MzhjYmZjZTBiNDY1NjdkZTRkOWI1 17. from pyspark.sql.types import StructType. Alternatively import all the types you require one by one: Following are some methods that you can use to rename dataFrame columns in Pyspark. If you are coming from a Python background I would assume you already know what Pandas DataFrame is; PySpark DataFrame is mostly similar to Pandas DataFrame with exception PySpark DataFrames are distributed in the cluster (meaning the data in DataFrames are stored in different machines in a cluster) and any operations in PySpark executes in parallel on all machines whereas Panda Dataframe stores and operates on a single machine. OWM0ZDg4YzE2YTUwYWIzMTI3OGEwY2VhYWI0YmNjYmVhYjI4NTU2OWM0YzVi If any of the labels is not found in the selected axis and errors=raise. 1. how to rename column name of dataframe in pyspark? Currently I have the sql working and returning the expected result when I hard code just 1 single value, but trying to then add to it by looping through all rows in the column. pyspark.pandas.DataFrame.rename . DataFrame can also be created from an RDD and by reading a files from several sources. If you wan to keep your code the way it is, use from panda import *. NDAyYTNkNTQ3ZjNkODIwNDFlODVhMzkzZGZjZWM4MzU4YjdjNDdlNDI3NmM4 Share. YjU1YmNlNDcyOWMzMzE5MmUyNWM0NjRjYmI3ZWM1NjNlMzExZTY2ZWIyYWMz - tdelaney Jun 16, 2020 at 6:18 indexIndex or array-like. We are not replacing or converting DataFrame column data type. Extra labels listed dont throw an error. NjE1YjU2ZDYzZjlmNTNjZmFkZjMyZWYyYjAzNGJiM2Q0ZWY5Mzc5Nzc5ODIw NjJhYjI0ZmFkY2Q0ZDNiYzhiNGQ1NjkwNWYwNTEwMzYzNmMwMDE2ODE1MWE2 ZmJiNzUxMmZlNTBmZGY0MGQ1ZGNmYmFhMjgzZDZhZDQ0NzY2NGNjZGRiZDM3 314mip 383 1 4 13 you didn't define the dataframe df. N2QyMGExOTE0YjU4MjQxZGU0MTQwMzI3OTQ4MTM5NWU3OTBjZTg0ODUyMDk3 Convert pyspark string to date format Ask Question Asked 7 years ago Modified 5 months ago Viewed 415k times 129 I have a date pyspark dataframe with a string column in the format of MM-dd-yyyy and I am attempting to convert this into a date column. Dict-like or functions transformations to apply to that axis values. YWEyNWYwNWIzMWRhMjdiYjlkZDU5NWQ0OTc1YTk0YTQ2YjliNTE1ZmZiNTIw NTlkZGU2ZmUzNDA5ZjQwNzdmM2UwZWIwMzNlMTY5YWIzZWJkMjc2OGRhYzEz Due to parallel execution on all cores on multiple machines, PySpark runs operations faster then pandas. Pyspark, update value in multiple rows based on condition. Below is the definition I took it from Databricks. contains labels that are not present in the Index being transformed. ZWIwNTQ0OTk2ZTMwODQ1OGZkOWU4Y2I3MTdjZGY3NmZhMzVmMDUwMjYyNmI1 number (0, 1). Alternative to specifying axis (mapper, axis=1 is equivalent to columns=mapper). In Pycharm the col function and others are flagged as "not found". Renaming column names of a DataFrame in Spark Scala. Below is an example of how to read a csv file from a local system. 19. ZDgxOTBjYzIzNmMyOWMwZDFjMzgyYWE5OTMyNzJlMWJkZTE0ZDUzN2Q2MGNk OGI1MWU5NGI2NDExY2RlM2U0Mjc4ODVlZjVkY2I2OTdkODk0YzFmZjZhNjI3 If you have no Python background, I would recommend you learn some basics on Python before you proceeding this Spark tutorial. Index to use for resulting frame. toDF Function to Rename All Columns in DataFrame. NzNkNWY4ZGFmM2U5ZjMyN2FiMGU0OTVkMTBhMzJkNzdjYjQwZTkxMDk3MGYx See more linked questions. Axis to target with mapper. MjFlM2IwYzQ0YTA3NTVkMmYzMjM0YTkwOWIyZGRkMzNlZjU1ZTQwNDZjYjI5 document.getElementById("ak_js_1").setAttribute("value",(new Date()).getTime()); SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand and well tested in our development environment, SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand, and well tested in our development environment, | { One stop for all Spark Examples }, PySpark Tutorial For Beginners (Spark with Python), pandas DataFrame vs PySpark Differences with Examples, PySpark DataFrame groupBy and Sort by Descending Order, PySpark alias() Column & DataFrame Examples, PySpark Replace Column Values in DataFrame, PySpark Retrieve DataType & Column Names of DataFrame, PySpark Count of Non null, nan Values in DataFrame, PySpark Replace Empty Value With None/null on DataFrame, Print the contents of RDD in Spark & PySpark, PySpark Drop Rows with NULL or None Values, AttributeError: DataFrame object has no attribute map in PySpark, PySpark Groupby Agg (aggregate) Explained. df.persist(pyspark.StorageLevel.MEMORY_ONLY) NameError: name 'MEMORY_ONLY' is not defined df.persist(StorageLevel.MEMORY_ONLY) NameError: name 'StorageLevel' is not defined import org.apache.spark.storage.StorageLevel ImportError: No module named org.apache.spark.storage.StorageLevel Any help would be greatly appreciated. By using createDataFrame() function of the SparkSession you can create a DataFrame. I got it worked by using the following imports: from pyspark import SparkConf from pyspark.context import SparkContext from pyspark.sql import SparkSession, SQLContext. 90 You can add from pyspark.context import SparkContext from pyspark.sql.session import SparkSession sc = SparkContext ('local') spark = SparkSession (sc) to the begining of your code to define a SparkSession, then the spark.createDataFrame () should work. YzBjN2JmODE5YTNkZDcyMDA4MGFlNWMzNTUxYmI5NDAxYjNiNzJlNzVkYmJm Related. eyJtZXNzYWdlIjoiMDMzN2FlN2RmMjRhZDViOWM5OWYwOTVlOWIwMTU5MzIy Can be either the axis name (index, columns) or OTE1ZDZkYmE2NmFjODVkYWY4OTc5OGEyYzhhMjU4NDc2ZmRhOTRmMWRiZjRi existing keys will be renamed and extra keys will be ignored. Use withColumnRenamed Function. for example: from pyspark.sql import functions as F df.select (F.col ("my_column")) Share. ZGU4MjdmMjExYzNkOTE3ZjUwYTFiMzdlZWZhNThjZWE5Mzc1ZjIwZDQ0Nzk2 Step 4: Apply the schema to the RDD and create a data frame. Since DataFrames are structure format which contains names and column, we can get the schema of the DataFrame using df.printSchema()@media(min-width:0px){#div-gpt-ad-sparkbyexamples_com-banner-1-0-asloaded{max-width:728px!important;max-height:90px!important}}if(typeof ez_ad_units!='undefined'){ez_ad_units.push([[728,90],'sparkbyexamples_com-banner-1','ezslot_11',840,'0','0'])};__ez_fad_position('div-gpt-ad-sparkbyexamples_com-banner-1-0'); df.show() shows the 20 elements from the DataFrame. {ignore, raise}, default ignore, pyspark.sql.SparkSession.builder.enableHiveSupport, pyspark.sql.SparkSession.builder.getOrCreate, pyspark.sql.SparkSession.getActiveSession, pyspark.sql.DataFrame.createGlobalTempView, pyspark.sql.DataFrame.createOrReplaceGlobalTempView, pyspark.sql.DataFrame.createOrReplaceTempView, pyspark.sql.DataFrame.sortWithinPartitions, pyspark.sql.DataFrameStatFunctions.approxQuantile, pyspark.sql.DataFrameStatFunctions.crosstab, pyspark.sql.DataFrameStatFunctions.freqItems, pyspark.sql.DataFrameStatFunctions.sampleBy, pyspark.sql.functions.approxCountDistinct, pyspark.sql.functions.approx_count_distinct, pyspark.sql.functions.monotonically_increasing_id, pyspark.sql.PandasCogroupedOps.applyInPandas, pyspark.pandas.Series.is_monotonic_increasing, pyspark.pandas.Series.is_monotonic_decreasing, pyspark.pandas.Series.dt.is_quarter_start, pyspark.pandas.Series.cat.rename_categories, pyspark.pandas.Series.cat.reorder_categories, pyspark.pandas.Series.cat.remove_categories, pyspark.pandas.Series.cat.remove_unused_categories, pyspark.pandas.Series.pandas_on_spark.transform_batch, pyspark.pandas.DataFrame.first_valid_index, pyspark.pandas.DataFrame.last_valid_index, pyspark.pandas.DataFrame.spark.to_spark_io, pyspark.pandas.DataFrame.spark.repartition, pyspark.pandas.DataFrame.pandas_on_spark.apply_batch, pyspark.pandas.DataFrame.pandas_on_spark.transform_batch, pyspark.pandas.Index.is_monotonic_increasing, pyspark.pandas.Index.is_monotonic_decreasing, pyspark.pandas.Index.symmetric_difference, pyspark.pandas.CategoricalIndex.categories, pyspark.pandas.CategoricalIndex.rename_categories, pyspark.pandas.CategoricalIndex.reorder_categories, pyspark.pandas.CategoricalIndex.add_categories, pyspark.pandas.CategoricalIndex.remove_categories, pyspark.pandas.CategoricalIndex.remove_unused_categories, pyspark.pandas.CategoricalIndex.set_categories, pyspark.pandas.CategoricalIndex.as_ordered, pyspark.pandas.CategoricalIndex.as_unordered, pyspark.pandas.MultiIndex.symmetric_difference, pyspark.pandas.MultiIndex.spark.data_type, pyspark.pandas.MultiIndex.spark.transform, pyspark.pandas.DatetimeIndex.is_month_start, pyspark.pandas.DatetimeIndex.is_month_end, pyspark.pandas.DatetimeIndex.is_quarter_start, pyspark.pandas.DatetimeIndex.is_quarter_end, pyspark.pandas.DatetimeIndex.is_year_start, pyspark.pandas.DatetimeIndex.is_leap_year, pyspark.pandas.DatetimeIndex.days_in_month, pyspark.pandas.DatetimeIndex.indexer_between_time, pyspark.pandas.DatetimeIndex.indexer_at_time, pyspark.pandas.groupby.DataFrameGroupBy.agg, pyspark.pandas.groupby.DataFrameGroupBy.aggregate, pyspark.pandas.groupby.DataFrameGroupBy.describe, pyspark.pandas.groupby.SeriesGroupBy.nsmallest, pyspark.pandas.groupby.SeriesGroupBy.nlargest, pyspark.pandas.groupby.SeriesGroupBy.value_counts, pyspark.pandas.groupby.SeriesGroupBy.unique, pyspark.pandas.extensions.register_dataframe_accessor, pyspark.pandas.extensions.register_series_accessor, pyspark.pandas.extensions.register_index_accessor, pyspark.sql.streaming.ForeachBatchFunction, pyspark.sql.streaming.StreamingQueryException, pyspark.sql.streaming.StreamingQueryManager, pyspark.sql.streaming.DataStreamReader.csv, pyspark.sql.streaming.DataStreamReader.format, pyspark.sql.streaming.DataStreamReader.json, pyspark.sql.streaming.DataStreamReader.load, pyspark.sql.streaming.DataStreamReader.option, pyspark.sql.streaming.DataStreamReader.options, pyspark.sql.streaming.DataStreamReader.orc, pyspark.sql.streaming.DataStreamReader.parquet, pyspark.sql.streaming.DataStreamReader.schema, pyspark.sql.streaming.DataStreamReader.text, pyspark.sql.streaming.DataStreamWriter.foreach, pyspark.sql.streaming.DataStreamWriter.foreachBatch, pyspark.sql.streaming.DataStreamWriter.format, pyspark.sql.streaming.DataStreamWriter.option, pyspark.sql.streaming.DataStreamWriter.options, pyspark.sql.streaming.DataStreamWriter.outputMode, pyspark.sql.streaming.DataStreamWriter.partitionBy, pyspark.sql.streaming.DataStreamWriter.queryName, pyspark.sql.streaming.DataStreamWriter.start, pyspark.sql.streaming.DataStreamWriter.trigger, pyspark.sql.streaming.StreamingQuery.awaitTermination, pyspark.sql.streaming.StreamingQuery.exception, pyspark.sql.streaming.StreamingQuery.explain, pyspark.sql.streaming.StreamingQuery.isActive, pyspark.sql.streaming.StreamingQuery.lastProgress, pyspark.sql.streaming.StreamingQuery.name, pyspark.sql.streaming.StreamingQuery.processAllAvailable, pyspark.sql.streaming.StreamingQuery.recentProgress, pyspark.sql.streaming.StreamingQuery.runId, pyspark.sql.streaming.StreamingQuery.status, pyspark.sql.streaming.StreamingQuery.stop, pyspark.sql.streaming.StreamingQueryManager.active, pyspark.sql.streaming.StreamingQueryManager.awaitAnyTermination, pyspark.sql.streaming.StreamingQueryManager.get, pyspark.sql.streaming.StreamingQueryManager.resetTerminated, RandomForestClassificationTrainingSummary, BinaryRandomForestClassificationTrainingSummary, MultilayerPerceptronClassificationSummary, MultilayerPerceptronClassificationTrainingSummary, GeneralizedLinearRegressionTrainingSummary, pyspark.streaming.StreamingContext.addStreamingListener, pyspark.streaming.StreamingContext.awaitTermination, pyspark.streaming.StreamingContext.awaitTerminationOrTimeout, pyspark.streaming.StreamingContext.checkpoint, pyspark.streaming.StreamingContext.getActive, pyspark.streaming.StreamingContext.getActiveOrCreate, pyspark.streaming.StreamingContext.getOrCreate, pyspark.streaming.StreamingContext.remember, pyspark.streaming.StreamingContext.sparkContext, pyspark.streaming.StreamingContext.transform, pyspark.streaming.StreamingContext.binaryRecordsStream, pyspark.streaming.StreamingContext.queueStream, pyspark.streaming.StreamingContext.socketTextStream, pyspark.streaming.StreamingContext.textFileStream, pyspark.streaming.DStream.saveAsTextFiles, pyspark.streaming.DStream.countByValueAndWindow, pyspark.streaming.DStream.groupByKeyAndWindow, pyspark.streaming.DStream.mapPartitionsWithIndex, pyspark.streaming.DStream.reduceByKeyAndWindow, pyspark.streaming.DStream.updateStateByKey, pyspark.streaming.kinesis.KinesisUtils.createStream, pyspark.streaming.kinesis.InitialPositionInStream.LATEST, pyspark.streaming.kinesis.InitialPositionInStream.TRIM_HORIZON, pyspark.SparkContext.defaultMinPartitions, pyspark.RDD.repartitionAndSortWithinPartitions, pyspark.RDDBarrier.mapPartitionsWithIndex, pyspark.BarrierTaskContext.getLocalProperty, pyspark.util.VersionUtils.majorMinorVersion, pyspark.resource.ExecutorResourceRequests.