42

Recently SE amended its terms of use to prevent automatic scraping of site information by third parties for information brokering (at least, that's what I gathered from it).

I just read about a new start-up that is scraping information from multiple sources to calculate how likely you are to leave your current job and selling this information to employers. What's concerning to me is that Stack Overflow is mentioned as a source of data by name in the article:

or use some niche industry sites, such as Stack Overflow, that combine aspects of social media and job boards

(source)

From what I can gather this app assigns you a "J-Score", which is supposed to indicate how likely it is that you're willing to leave your current job.

Emerging from that, if the app chooses to classify any particular activity on SO to increase your likelihood of leaving, this could have devastating negative effects on SO users. Companies might purchase that information and make layoff decisions based on it. If SE cannot prevent the data from being aggregated that way, this essentially means that activity on SE could lead to people losing their jobs - through no fault of their own. This could be an active incentive to stop using SE sites altogether for employees of unreasonable employers.

Also, looking further into this, it seems like they are purchasing per-person information about what links you click through. Facebook is able to directly track this via cookies and embedded facebook share buttons. Does SE have those in such a way that allows that? If so, do we need to reconsider having those?

I strongly believe SE needs to endeavour to nip this kind of activity in the bud in order to prevent a chilling effect on site activity.

Is this software violating SE's terms of use? If so, can we expect SE to protect its users from it? Can it even do that?

22
  • 3
    So it begins...
    – Seth
    Commented Sep 12, 2016 at 7:41
  • 11
    "niche industry site", is that what we've become? Commented Sep 12, 2016 at 7:42
  • 1
    Well it's not about legal aspect, this can be really annoying for people not using nicknames everywhere.
    – Walfrat
    Commented Sep 12, 2016 at 8:21
  • @Walfrat I'm asking specifically if SE will ensure to protect its users preventatively from databrokers like this who seek to monetize our activity on here in a way that only leads to harm for us.
    – Magisch
    Commented Sep 12, 2016 at 8:22
  • The title ask if the app is violating SE terms of service, clearly legal people are probably even better than SO mods to answer this. First we need to know if currently, this application is really not legal for SE terms of service. And after, what SE will do to ensure that this won't happen. Becaue if this is legal at the moment, first point is already to change terms of service
    – Walfrat
    Commented Sep 12, 2016 at 8:26
  • When you say app can you clarify whether this is a listed app on stackapps that users can allow access to their account or whether you're talking about some external "service* that happens to use the SE API/scrape SE sites to grab information? Commented Sep 12, 2016 at 8:31
  • 3
    @JonClements No, as described in the linked article, this is a third party service that scrapes information en-masse from SE. The service does not provide a way to opt out - your information is being taken and conclusions sold to your employer whether you like it or not.
    – Magisch
    Commented Sep 12, 2016 at 8:32
  • 1
    They're most likely going to use the metadata (e.g. your SO activity during a typical work day). The data is freely available in the data dump, it seems unlikely any ToS can protect us against this kind of use
    – Pekka
    Commented Sep 12, 2016 at 8:38
  • 1
    @Pëkka I have this strange image of them looking at that sort of data and the algorithms going: "Those CMs seem to spend all day on SE sites - they're obviously after a new job"... :) Commented Sep 12, 2016 at 8:40
  • @Jon yeah, there's going to be major false positives. Just look at Jon Skeet. I have, however, an eerie sense that for most SO users, there is a correlation between job satisfaction/engagement and SO activity.
    – Pekka
    Commented Sep 12, 2016 at 8:41
  • 1
    @Pëkka last time I looked at the survey data, quite a lot were consultants / self-employed/not looking for a new job or are otherwise "Full Stack Overflow Developers" - so errr... Relying on such results would be quite foolish. Commented Sep 12, 2016 at 8:44
  • 1
    @Magisch we all know that SE (especially SO) is like Hotel California - you can check out any time you like... :p Commented Sep 12, 2016 at 8:45
  • 1
    @Walfrat but such stuff should be posted anonymously even now. People who post stuff like that with their real name, it's their own fault if there's consequences at work
    – Pekka
    Commented Sep 12, 2016 at 8:58
  • 1
    @Pëkka true but there's still people who just don't realize it specially when they're very unfamiliar with numeric world, a warning explaining some of possible consequences would still be welcomed
    – Walfrat
    Commented Sep 12, 2016 at 9:03
  • 1
    A point about facebook things like, some people don't know it, but when you see a page where there is a "i like" facebook/twitter/... button, they can already know you have visited the page, and even if you're not connected, they can have profile based on IP, and so when you connect from this IP make the link. At least SE has no facebook button and so on, everything is internal of SE except authentication.
    – Walfrat
    Commented Sep 12, 2016 at 9:47

2 Answers 2

15

What's available regarding how they're collecting and using the data is so very ambiguous that I can't even be certain if they're using your activity on the site in any meaningful way. The new rules pretty specifically restrict getting any information about you to whatever is exposed in the API.

As for Jobs, they have no way of identifying passive job seekers (as in "not looking, but open to being contacted"). They can only identify you as an active job seeker if you set that bit public, which I wouldn't recommend if you think that doing so might inconveniently expedite your departure from your current gig.

We're going to keep an eye on it to the extent that we can. We don't have any embedded social share buttons (we just use links that lead you to that functionality on their side). Of course, if you use Facebook to log in, well, they're going to know about it :)

For right now, there's just too little information to really say much, but we're going to keep an eye on it. If it turns out that this is in fact a new 'evil' use case, and it's positioned to interfere with how folks participate on the site, then I think we'd have a clearer definition of how in order to adjust language as needed and possibly take action.

Right now we just don't know .. well .. much at all. I sincerely hope that someone over there isn't of the opinion that rep increases directly correspond to active job seeking, or every software company in existence is going to look like they're doing a bad job of retaining programmers :)

7

After a little looking, I figured out that company calculating the J-Score™ does not obtain the data from Stack Overflow directly:

Joberate leverages only publicly available Social Data in calculating a person’s J-Score™, which means that there is no violation of anybody’s data privacy. Social Data is legally licensed from Social Media data resellers (i.e. GNIP, www.gnip.com) and otherwise obtained using Joberate’s patent-pending platform, which measures the intensity of a person’s job seeking activities, taking into consideration time, volume, and relevance when calculating a person’s J-score™.
Source

If you look at Gnip's page here, they list Stack Overflow. Gnip uses the Stack Exchange API, so I will assume all the Stack Exchange data was gathered by Gnip via the API.

You can get questions and answers from the API. Due to the license (CC by-SA), you will also get the username and profile image associated with the post.

At first, I couldn't figure out what value post data has to Joberate. But I found a possibility:

In addition to J-Score, the platform performs psychological profiling based on the NLP (Natural Language Processing) of CV’s and Social Data.

Automate matching the ideal candidates based on company culture, candidate’s job seeking behavior, preferred communication style, and their availability.

Clearly there's some merit to this. It benefits everyone when it's a good fit; you don't want to realize you hate your job/company after you've invested time.


Legality

Let's look at some selections from our Terms of Service:

[N]o Profile Content, including API Profile Content, may be used in any way that implies a user is affiliated with, has signed up for, or is in any way associated with a third party without explicit permission from Stack Exchange or the user.


Under no circumstances will Subscriber use the Network or the Service to
(a) send unsolicited e-mails, bulk mail, spam or other materials to users of the Network or any other individual,
(b) harass, threaten, stalk or abuse any person or party, including other users of the Network...
(e) copy, download, or scrape any Personal Profile Content for the purpose of indexing software engineers, social recruiting, sourcing, employment-related services, compiling databases of employment solicitation targets, providing content for a hiring platform without the express permission of Stack Exchange or the User.


Here's my look on this.

  • That first section may or may not apply here. The software appears to only produce the score. A higher J-Score indicates that you may be looking to leave but it's not clear how Stack Exchange data fits into this. Without more details, as Tim Post says, it's hard to tell what they do with the data (besides machine learning), or how it goes into the score.

  • You might get emailed if you're using Joberate's system, but that is not bulk mail or unsolicited. Section (a) does not apply.

  • Section (b) might have something. Of course, without being contacted, you cannot be harassed or threatened, but stalking seems to fit. Unfortunately, I think that the legal definition may be a bit different:

Although stalking is illegal in most areas of the world, some of the actions that can contribute to stalking can be legal, such as gathering information, calling someone on the phone, sending gifts, emailing or instant messaging. They become illegal when they breach the legal definition of harassment e.g. an action such as sending a text is not usually illegal, but is illegal when frequently repeated to an unwilling recipient.

The Violence Against Women Act of 2005, amending a United States statute, 108 Stat. 1902 et seq, defined stalking as "engaging in a course of conduct directed at a specific person that would cause a reasonable person to—

(A) fear for his or her safety or the safety of others;
(B) suffer substantial emotional distress."

From Wikipedia

From that section, it's pretty clear that stalking legally refers to repeated harassment. "Information gathering" isn't illegal on its own, it would seem.

  • Sections (c) and (d) are excluded from my quote as I did not feel that impersonation or posting wrong information respectively were relevant here.

  • Section (e) also looks like it might have something. That is, until you remember that "Personal Profile Content" refers to:

Profile Content that is NOT available via the Stack Exchange API

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .