data_length makes less sense when data is a nested dictionary rather than a json string #7030

zachliu · 2024-06-24T23:23:14Z

Issue Summary

Before this PR #6687, the data returned by query runners are json strings. Hence the data_length calculated by len(data) makes sense:

redash/redash/tasks/queries/execution.py

Lines 194 to 200 in 60a12e9

    
           logger.info( 
        
               "job=execute_query query_hash=%s ds_id=%d data_length=%s error=[%s]", 
        
               self.query_hash, 
        
               self.data_source_id, 
        
               data and len(data), 
        
               error, 
        
           )

But after #6687, data is a nested dictionary. And len(data) only gives the number of keys it has. In most cases, there are only two keys, "columns" and "rows", so the data_length doesn't really give us useful information.

Steps to Reproduce

Search for data_length= in your logs.

Technical details:

Redash Version: 24.06.0-dev

The text was updated successfully, but these errors were encountered:

zachliu · 2024-06-27T18:23:30Z

I replaced len(data) with

def _get_size_iterative(dict_obj):
    """Iteratively finds size of objects in bytes"""
    seen = set()
    size = 0
    objects = deque([dict_obj])

    while objects:
        current = objects.popleft()
        if id(current) in seen:
            continue
        seen.add(id(current))
        size += sys.getsizeof(current)

        if isinstance(current, dict):
            objects.extend(current.keys())
            objects.extend(current.values())
        elif hasattr(current, '__dict__'):
            objects.append(current.__dict__)
        elif hasattr(current, '__iter__') and not isinstance(current, (str, bytes, bytearray)):
            objects.extend(current)

    return size

It works fine. The in-memory dictionary size is usually a lot larger than in-disk storage size such as a csv file due to Python's in-memory storage overheads but at least it gives us a relative value especially informative because I'm using data_length in a DataDog dashboard to monitor user's query result sizes

zachliu mentioned this issue Jul 31, 2024

get data size in memory for better logs #7090

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data_length makes less sense when data is a nested dictionary rather than a json string #7030

data_length makes less sense when data is a nested dictionary rather than a json string #7030

zachliu commented Jun 24, 2024

zachliu commented Jun 27, 2024

data_length makes less sense when data is a nested dictionary rather than a json string #7030

data_length makes less sense when data is a nested dictionary rather than a json string #7030

Comments

zachliu commented Jun 24, 2024

Issue Summary

Steps to Reproduce

Technical details:

zachliu commented Jun 27, 2024