Changing maxCharsPerColumn doesn't do anything #85

MattiaCostamagna · 2024-01-29T17:37:06Z

I'm using the Spark Salesforce connector 2.12 v1.1.4 to read data from my AWS Glue Job.
In order for it to work I followed the official guide on AWS where it says that I have to add the following dependencies to my job for the connector to work:

force-partner-api-40.0.0.jar
force-wsc-40.0.0.jar
salesforce-wave-api-1.0.9.jar
spark-salesforce_2.11-1.1.1.jar

I decided to use the latest version of each dependencies, so this is my configuration now:

force-partner-api-60.0.0.jar
force-wsc-60.0.0.jar
salesforce-wave-api-1.0.10.jar
spark-salesforce_2.12-1.1.4.jar

I'm also using bulk queries, but I'm getting the following error: com.univocity.parsers.common.TextParsingException: Length of parsed input (4097) exceeds the maximum number of characters defined in your parser settings (4096).

I went through the documentation and saw that there is a parameter called maxCharsPerColumn which should fix this problem, so I changed my read instruction to do the following:

df = spark.read.format("com.springml.spark.salesforce").option("soql",soql).option("username", "xxxxxx").option("password", "yyyyyy").option("login","zzzzzz").option("version", "52.0").option("bulk","true").option("maxCharsPerColumn", "8192").option("sfObject", data_source).load()

The result doesn't change at all.
Also from the logs I can see the CsvParserSettings passed from the connector, and I can see that the max chars per column is still 4096:

Parser Configuration: CsvParserSettings:
	Auto configuration enabled=true
	Auto-closing enabled=true
	Autodetect column delimiter=false
	Autodetect quotes=false
	Column reordering enabled=true
	Delimiters for detection=null
	Empty value=null
	Escape unquoted values=false
	Header extraction enabled=null
	Headers=null
	Ignore leading whitespaces=true
	Ignore leading whitespaces in quotes=false
	Ignore trailing whitespaces=true
	Ignore trailing whitespaces in quotes=false
	Input buffer size=1048576
	Input reading on separate thread=true
	Keep escape sequences=false
	Keep quotes=false
	Length of content displayed on error=-1
	Line separator detection enabled=true
	Maximum number of characters per column=4096
	Maximum number of columns=512
	Normalize escaped line separators=true
	Null value=null
	Number of records to read=all
	Processor=none
	Restricting data in exceptions=false
	RowProcessor error handler=null
	Selected fields=none
	Skip bits as whitespace=true
	Skip empty lines=true
	Unescaped quote handling=nullFormat configuration:
	CsvFormat:
		Comment character=#
		Field delimiter=,
		Line separator (normalized)= 
		Line separator sequence=\n
		Quote character="
		Quote escape character="
		Quote escape escape character=null

Is there anything wrong with my code?

The text was updated successfully, but these errors were encountered:

MattiaCostamagna · 2024-01-30T16:06:28Z

I haven't found a way to make it work, but for the moment I found a workaround by using the jar versions specified in the AWS blog article and use Glue 2 (which supports Spark 2.4, Scala 2 and Python 3). These are the jar versions:

force-partner-api-40.0.0.jar
force-wsc-40.0.0.jar
salesforce-wave-api-1.0.9.jar
spark-salesforce_2.11-1.1.1.jar

And this is the code to read from Salesforce:

    df = spark.read\
        .format("com.springml.spark.salesforce")\
        .option("soql", soql)\
        .option("username", db_username)\
        .option("password", db_password_token)\
        .option("login", "xxx")\
        .option("version", "52.0")\
        .option("sfObject", data_source).load()

As you can see I removed the bulk query, and right now I can read 137447 records with 120 columns each in less than 5 minutes.
As I said, this is a workaround because I'm currently using an older version of pretty much everything.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changing maxCharsPerColumn doesn't do anything #85

Changing maxCharsPerColumn doesn't do anything #85

MattiaCostamagna commented Jan 29, 2024

MattiaCostamagna commented Jan 30, 2024

Changing maxCharsPerColumn doesn't do anything #85

Changing maxCharsPerColumn doesn't do anything #85

Comments

MattiaCostamagna commented Jan 29, 2024

MattiaCostamagna commented Jan 30, 2024