Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contributors by email and wordcloud charts have inconsistent results #195

Open
smacker opened this issue Jul 3, 2019 · 9 comments
Open
Assignees
Labels
bug Something isn't working triage/needs-product-input This needs input from product

Comments

@smacker
Copy link
Contributor

smacker commented Jul 3, 2019

Screenshot 2019-07-03 at 17 11 47

You can see Maximo and Miguel on the left but not on the right. Why?

@dpordomingo
Copy link
Contributor

From my empathy session:
https://github.com/src-d/empathy-sessions/issues/38#issuecomment-497744544

The word cloud chart, when is in a dashboard, it depends on chart width to work properly: big words disappear if they do not fit in the chart, what is wrong.

@gomesfernanda
Copy link
Contributor

gomesfernanda commented Jul 23, 2019

What @dpordomingo said is right, sometimes there's no space for the larger words (larger contributors).

After talking to @smacker and trying different things, one thing we can try (but I cannot confirm this will work 100% of the time) is to limit the number of words (records) to 50 and rename the title to "Top 50 contributors by e-mail"

image

@gomesfernanda
Copy link
Contributor

Of course, this is not an optimal/definite solution, but does the trick for this chart. We should discuss what would be the final solution for this, but I don't believe it will involve just query and interface setting, but changing superset itself

@dpordomingo
Copy link
Contributor

I do not see the benefit of showing more than 50 developers (or even less) in the pie chart.
About the word cloud... the problem can appear the same, especially with long emails, no matter if we limit by 20 items.

Here is what I see with limiting by 20, and different resolutions:

1100px:
image
1879px:
image

Turns out that even with only 20 developers, the first top 3 committers do not appear in some resolutions.

I wonder if it could be useful to substring the email, e.g. by 15 chars.

image
image

As it can be seen in the two screenshots above, they share the screen resolution (1100px), but while the first attempt succeeded, the second failed showing the top 1 commiter.

Proposed solution

As fas as I understand the problem appear when there is a lot of difference between the number of contribs between the developers, and the size of their names. Then, why not decrease the font sizes, and use 8-30 range instead of the current 20-50, which causes that 1st developer is printed with 50px font size

I think a more reliable solution should include the three mentioned strategies (sorted from what I found more relevent):

  • reduce the font size range
  • limit the email length
  • limit 20~30 (not sure about this)

image

As can be seen, if font range and email length are small enough, the limit for the number of developers is not necessary
image

@gomesfernanda
Copy link
Contributor

Well, according to research (https://www.freshaddress.com/blog/long-email-addresses/) the average e-mail address has 21 characters, so we could have a substring of 21 characters. Honestly, I don't know what would be a final solution for this problem and if what we're doing here solves the problem definitely.

@dpordomingo
Copy link
Contributor

The last screenshot was created with length 15 and reducing the font size range to 8-30. With more than 15 chars and keeping the current font size range 20-50 our first contributor (MaximeBeauchemin) disappeared in most of the resolutions, and the next two top contributors also dissapeared many times.

I'm not sure about the utility of this chart if we can not rely on its content with our current configuration.

@gomesfernanda
Copy link
Contributor

I like the wordcloud because it's a different chart and brings a new type of visualization. On the Overview dashboard, I believe it helps to show diversity on what the user can do and also is a breath of fresh air among pie and bar charts.

I also think that we cannot rely on this chart, but excluding it from the Overview dashboard doesn't solve what we face.

@dpordomingo
Copy link
Contributor

If we want it, we should tune its config to solve its limitations about word lengths:

  • increase available space of the chart itself,
  • max length in words,
  • font-size range,
  • limit words.

From my tests, I found no silver bullet.

I wonder if it would be reasonable to invest time on analyze if there is something that we could fix in the chart source code itself, forcing that more relevant words are not removed from the output, but resized to be there.

@se7entyse7en se7entyse7en added bug Something isn't working triage/needs-product-input This needs input from product labels Oct 24, 2019
@se7entyse7en
Copy link
Contributor

IMHO this is simply unreliable.

I mean, we could try to "fix it" for our default dashboard, but we do not have control over how the final user we'll use this type of visualization. Moreover, in this case, we're using it for emails, but it's just our use case.

As the first thing that I'd do is to at least avoid top values from disappearing, probably there's an option for that, even if this means that the string is cropped. But at least we don't show wrong information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage/needs-product-input This needs input from product
Projects
None yet
Development

No branches or pull requests

5 participants