Property talk:P6366
Documentation
identifier for an object or topic in the Microsoft Academic Graph (until 31 December 2021)
List of violations of this constraint: Database reports/Constraint violations/P6366#Single value, SPARQL
List of violations of this constraint: Database reports/Constraint violations/P6366#Unique value, SPARQL (every item), SPARQL (by value)
List of violations of this constraint: Database reports/Constraint violations/P6366#Format, SPARQL
List of violations of this constraint: Database reports/Constraint violations/P6366#Conflicts with P31, SPARQL
List of violations of this constraint: Database reports/Constraint violations/P6366#Scope, SPARQL
List of violations of this constraint: Database reports/Constraint violations/P6366#Entity types
List of violations of this constraint: Database reports/Constraint violations/P6366#Type Q35120, SPARQL
MAG Data Access
editVolumetrics
- 230k fields of study (FOS, topic) with ID and Hierarchy https://docs.microsoft.com/en-us/azure/cognitive-services/academic-knowledge/fieldsofstudyentityattributes, https://academic.microsoft.com/#/topics/0/
- 25.4k affiliations (research organizations) https://docs.microsoft.com/en-us/azure/cognitive-services/academic-knowledge/affiliationentityattributes, https://academic.microsoft.com/#/institutions/0/
- 209M papers+patents https://docs.microsoft.com/en-us/azure/cognitive-services/academic-knowledge/paperentityattributes
- 252M authors: https://docs.microsoft.com/en-us/azure/cognitive-services/academic-knowledge/authorentityattributes. Not all are deduplicated, but can be linked (claimed) by the individual researcher. Eg I had about 15 records until I claimed them and they got merged. Link to CrunchBase, a database of companies/startups
- 48.6k Journals with rank
- 4.3k Conferences with rank
Microsoft Relational dump:
- How to get it from Microsoft Azure: https://docs.microsoft.com/en-us/academic-services/graph/get-started-setup-provisioning
- Schema: https://docs.microsoft.com/en-us/academic-services/graph/reference-data-schema
University of Freiburg: MAG as RDF
- http://ma-graph.org/. Author: Michael Färber, University of Freiburg
- http://doi.org/10.5281/zenodo.2159723 (https://zenodo.org/record/2159723), more versions at http://ma-graph.org/rdf-dumps/. Periodically updated RDF dump files of the whole MAG
- 8 Billion Triples, 86-91 Gb nt.bz2, 1.2Tb uncompressed. Version 2018-11-09
- URI resolution and HTML page descriptions of MAG resources via pubby, eg
- Has some omissions compared to the MAG relational dump. The crucial is that the relation Paper-Author-Affiliation is missing, replaced with a direct link Paper-Author
Open Academic Graph:
- https://www.openacademic.ai/oag/, dump in simple json format. Integrates the following:
- 166M papers from Microsoft Academic Graph (MAG): snapshot of 2017-06-09 (MAG is updated weekly). 104GB
- 154M papers from Arnet Miner: snapshot of 2017-03-22. 39GB
- 64M common (linking relations) with 99.7% precision. 1.6GB
Microsoft Graph API:
- Entity model is poorer than the relational data: https://docs.microsoft.com/en-us/azure/cognitive-services/academic-knowledge/graphsearchmethod#graph-schema, https://docs.microsoft.com/en-us/azure/cognitive-services/academic-knowledge/entityattributes
- powerful paper search expression language (for the "evaluate" call)
- powerful interpret method for entering NLP questions with auto-completion (eg "papers of vladimir alexiev" or "papers citing Information integration with ontologies"), translating to search expressions
- powerful graph search/navigation using LINQ Lambda expressions or a JSON navigation syntax
--Vladimir Alexiev (talk) 08:55, 18 January 2019 (UTC)
- @Vladimir Alexiev: I have uploaded ids for institutions by matching them using the GRID ID (P2427) provided in http://ma-graph.org/ (https://tools.wmflabs.org/editgroups/b/OR/59c3cc81/), but Jklamo noticed that this GRID mapping is not always reliable − sometimes they picked the GRID ids of national subsidiaries of international companies. I am fixing this now. − Pintoch (talk) 14:10, 20 January 2019 (UTC)
Duplicates
editMany IDs are now being added to chemical substances and I see a lot of duplicates in Microsoft Academic website, e.g. [1], [2]. Both are in one item lead(II) chromate (Q367871). What should be done about it as we have single-value constraint (Q19474404) constraint here? Is it possible to mark two entries in Microsoft Academic as duplicates and maybe MS Academic staff would merge them? Or maybe we should deprecate one ID? Add an item as exception to constraint (P2303)? Pinging Nikola Tulechki as I see you're currently adding many IDs via QS. Wostr (talk) 13:52, 13 February 2020 (UTC)
- @Wostr: IMHO the best option regarding the example you give is the merge the items at the MAG end. Given that I am currently importing all the 230K FOS from the latest MAG dump through the Wikipedia links, I suggest that we wait until the import finishes and produce a report with which to engage the MAG people. This way we can also incite them to add direct mapping to Wikidata on their app in complement of (or instead of) the Wikipedia links they currently have. Nikola Tulechki (talk) 14:08, 14 February 2020 (UTC)
- Constraint violation reports will be generated automatically and can serve as lists of duplicates fir MAG staff to merge. So please don't add Exception to constraint. --Vladimir Alexiev (talk) 14:37, 14 February 2020 (UTC)
- It cannot serve as a duplicate list without checking the list first. There are situations like this (from my watchlist from today):
- Some IDs are not functional: cholesteryl ester (Q415555) with [3] and [4] (which redirects to the home page)
- Some IDs are not 1:1 equivalents of the WD items, like in D-mannopyranose (Q335208): [5] and [6] – there are many such situations, the last should be probably in mannose metabolic process (Q21121642). I don't know what is the scale of that issue (everyday I see at least several cases like this only on my watchlist), but it seems it would require much effort to correctly map these to the WD items, before it would be possible to have a list of real duplicates.
- Wostr (talk) 17:17, 14 February 2020 (UTC)
- It cannot serve as a duplicate list without checking the list first. There are situations like this (from my watchlist from today):
- Okay, as all the IDs are imported now, how to deal with (real) duplicates? With over 9k violations of 'single value' constraint it's not possible to leave them as they are, because it will disrupt checking the constraint violations list (on which there are not only real duplicates but also IDs that are not equivalent to WD items). Maybe using deprecated rank for one ID with duplicate entry (Q1263068) as a reason for deprecation? Or using mapping relation type (P4390)? Or maybe MS Academic staff could check our list of constraint violations, merge the IDs on their end and provide a list of merged IDs? Wostr (talk) 20:14, 24 June 2020 (UTC)
- @Wostr: Actually I think the single value constraint should be removed because the entries at Microsoft are not checked for aliases themselves. They just take strings and make them entries. --SCIdude (talk) 08:10, 2 September 2020 (UTC)
Microsoft Academic Website: No longer accessible after Dec. 31, 2021
editMicrosoft have announced that they will be retiring this service: [7]. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:00, 14 May 2021 (UTC)
- This has now come to pass. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:28, 1 January 2022 (UTC)
Worth noting here that while the MAG project is retired, openalex has effectively continued the project, under a new more permissive license (CC Zero). In particular, Openalex was seeded with the MAG metadata, using the same old MAG integer identifiers, and there is a related wikidata property for Openalex IDs (https://www.wikidata.org/wiki/Property:P10283). One difference is that Openalex IDs have a letter prefix (like 'W' for works), which the MAG identifiers did not, so you need to know the entity type to map from old MAG identifiers to new Openalex identifiers. Blnewbold (talk) 01:20, 13 September 2022 (UTC)
formatter url not available
editApparently this doesn't work on archive.org (see discussion on project chat and properties for deletion). Accordingly I deprecated the formatter url. --- Jura 14:49, 7 January 2022 (UTC)
- Just as an update for posterity: Those discussions have been archived and are located here: Project chat/Archive/2022/01 and Properties for deletion/Archive/2022/2. – desoda (T | C) 19:04, 26 December 2022 (UTC)