Testwiki:Request a query/Archive/2024/09

All items that have a P1705 for Skolt Saami and a P4119 value

I've been trying to get a list of all of the items on wd that have a Template:P value in Template:Q and have a Template:P value as someone has imported all of these as part of a batch where everything else is correct and these particular values are usually not, so I don't want to just revert the batch. I haven't figured out how to isolate the Template:P for only the Template:Q values, so I'd appreciate the help. - Yupik (talk) 16:37, 1 September 2024 (UTC)

Template:Re Template:SPARQL

does this return what you want? Mahir256 (talk) 16:43, 1 September 2024 (UTC)

Yes, thank you! - Yupik (talk) 16:45, 1 September 2024 (UTC)

WDQS being weird (railway junctions)

Can anybody spot what's going on here ?

Template:SPARQL2

The query (almost) all works as is should do -- it finds railway junctions which might have Template:P statements, but which say that there are actually better items for those statements, and tells me the railway-line sections going through each junction, and counts the number which are described by an external source Template:Q that happens to only describe currently existing ones.

Except that the Template:P property works in one place, but has to be commented out in the second.

Any thoughts to explain this? Jheald (talk) 16:53, 5 September 2024 (UTC)

Labels for scholarly articles

I took my very simplest query to try to get my head round federated queries. I am looking simply for the count of different types of thesis at an institution. I'm not getting the labels for the type of thesis, even though I think those labels must be in the scholarly subgraph, what am I doing wrong?

Template:SPARQL2 DrThneed (talk) 23:26, 4 September 2024 (UTC)

No the label for the types are in the main graph. So this works:

Template:SPARQL2

Although the test link doesn't work. We need to update that one to specify the scholarly query service. Here's a working short link: https://w.wiki/B6j4 Ainali (talk) 09:32, 5 September 2024 (UTC)

Oh I should have thought of that. Thanks Jan. *Individual theses* would have a label in the scholarly subgraph, but not the subclasses, right? DrThneed (talk) 20:30, 5 September 2024 (UTC)

OK I thought that was OK on first glance but now I see the counts are completely different!

The query is returning 1654 master's theses for Lincoln University on the main graph and 94980 on the scholarly subgraph! The 1654 is the correct figure (and the numbers look to be correct for the initial query I posted without labels). What's going on? DrThneed (talk) 21:10, 5 September 2024 (UTC)

My fault, I should have counted the distinct thesis when getting the labels. This gives your expected result with labels:

Template:SPARQL2

Real shortlink: [1] Ainali (talk) 22:05, 5 September 2024 (UTC)

Thanks Jan - needed a space after COUNT (https://w.wiki/B72w) but otherwise works! I'd like to understand why adding labels requires a 'distinct' here, when it doesn't for the same query on the main graph, is that something you can explain? DrThneed (talk) 22:21, 5 September 2024 (UTC)

@DrThneed The reason is a limitation of federation and blazegraph. In Wikidata:SPARQL_query_service/WDQS_graph_split/Federation_Limits we explain that federation can happen in two different ways:

the host service sending data to the federated service (least efficient)
the host service receiving data from the federated service

In your query federation works by sending the publications to the wikidata_main subgraph endpoint, but because there are many publications it is making multiple requests (by sending them in chunks) but the types it is asking are likely the same and thus it's retrieving multiple times the same label, blazegraph being unable to determine that these are the same types they remain as duplicates.

I think that a better way to do what you want is using query-main and pulling the publications from the scholarly subgraph:

SELECT ?thesisType ?thesisTypeLabel (COUNT(?thesis) AS ?count) 
WHERE {
 hint:Query hint:optimizer "None" .
 SERVICE wdsubgraph:scholarly_articles {
  ?thesis wdt:P4101 wd:Q1048626;
          wdt:P31 ?thesisType
 }
 ?thesisType rdfs:label ?thesisTypeLabel .
 FILTER (LANG(?thesisTypeLabel) = 'en')
}  
GROUP BY ?thesisType ?thesisTypeLabel ORDER BY DESC (?count)

Try it DCausse (WMF) (talk) 09:18, 6 September 2024 (UTC)

Thank you for the explanation @DCausse (WMF), that's really helpful. So much to learn! DrThneed (talk) 04:52, 7 September 2024 (UTC)

inferring narrower occupations

Problem: we have large numbers of people with a sole occupation of "researcher" and a description either "researcher" or based on an ORCID. This makes disambiguation really hard.

Proposed solution: Most journals have a main subject, many of which are linked by a P3095 to an occupation, so we can link a human through articles to journals then topics and occupations. If the person has 10 articles in wikidata, picking the most common occupation linked to them should be a good approximation of their occupation.

Problem: So far the query I've got times out. How do I make it go faster so it doesn't timeout? How to ignore people occupation of "researcher" AND another occupation?

Template:SPARQL2

Secondary problem: how do I find academic journals without P921's and P3095's?

Stuartyeates (talk) 10:18, 11 September 2024 (UTC)

Islands

A lift of islands whose name (in English) begins with a letter A-H.

Thank you! — Martin (MSGJ · talk) 13:04, 11 September 2024 (UTC)

Something like this...

Template:SPARQL Piecesofuk (talk) 14:53, 11 September 2024 (UTC)

Help with WDGS

Hi, I have a number of queries written as part of a project Wikidata:WikiProject LSEThesisProject and will need to re-write them due to the Graph Split. My SPARQL knowledge is basic and the queries produced were achieved by trial and error / modifying others' queries / kind help from the community. In preparation for trying to learn how I might re-write those queries I tried, using the Federation Guide, to write federated queries which would pick up all research outputs produced by an academic - this includes not only scholarly articles, but also book chapters, version edition translations, blog posts, chapters and articles. In the main graph as it was all these can be picked up in one query https://w.wiki/B6Ct but I'm failing to re-write this for the scholarly graph. I've tried

SELECT ?item ?itemLabel ?itemType ?itemTypeLabel

WHERE

{

&nbsp; ?item wdt:P50 wd:Q17508688.

&nbsp; SERVICE wdsubgraph:wikidata_main {

&nbsp;&nbsp; ?item wdt:P50 wd:Q17508688.


}

&nbsp; SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],mul,en". } # Helps get the label in your language, if not, then default for all languages, then en language

}

This gives me no results.

And I've tried

SELECT ?item ?itemLabel ?itemType ?itemTypeLabel

WHERE

{

&nbsp; ?item wdt:P50 wd:Q17508688. 

&nbsp; UNION 

&nbsp; { SERVICE wdsubgraph:wikidata_main { ?item wdt:P50 wd:Q17508688}&nbsp; }

&nbsp; &nbsp; 

&nbsp;

&nbsp; SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],mul,en". } # Helps get the label in your language, if not, then default for all languages, then en language

}

Which gives an error message and says the query is malformed at UNION.

Would someone be able to point out what I'm doing wrong and show me how to produce these queries.

Thanks HelsKRW (talk) 08:40, 4 September 2024 (UTC)

@HelsKRW The UNION requires the parts to be wrapped with curly brackets:

  { ?item wdt:P50 wd:Q17508688. } 
  UNION 
  { SERVICE wdsubgraph:wikidata_main { ?item wdt:P50 wd:Q17508688}  }

Here below should be your query rewritten (to run on https://query-main.wikidata.org/):

SELECT ?item ?itemLabel ?itemType ?itemTypeLabel WHERE {
  VALUES (?author) {(wd:Q17508688)}
  {
    # get the publications from the scholarly subgraph 
    SERVICE wdsubgraph:scholarly_articles {
      ?item wdt:P50 ?author ;
            wdt:P31 ?itemType
      # Instruct the label service to gather the label of the publication
      # The label for ?itemType will be fetched in the host query, the type is probably part of the main graph
      BIND(?itemLabel AS ?itemLabel)
      SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
    }
  } UNION {
    # Union them with the publications in the main graph (blogs, articles...)
    ?item wdt:P50 ?author ;
          wdt:P31 ?itemType
  }  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

Try it DCausse (WMF) (talk) 11:28, 4 September 2024 (UTC)

Thank you very much for your help. I've modified the query I'd written for the scholarly graph which is now working and I can see that the longer query you've written for the main graph is also working. Could you tell me more about how to know when the query should be written on the scholarly graph or the main graph? And would you be able to tell me more about the VALUES, BIND and UNION commands in the query you've written for the main graph. Using this query I've tried modifying some other queries, but I'm hitting up against a series of error messages and despite reading the federated guide am struggling to understand or get to grips with how to write a federated query. Thanks HelsKRW (talk) 10:25, 5 September 2024 (UTC)

Unfortunately, while writing Wikidata:SPARQL_query_service/WDQS_graph_split/Internal_Federation_Guide I could not find a reasonable and comprehensive set of characteristics to determine if it's better to use query-main or query-scholarly for the host query. Generally both are doable but for certain queries using one or the other greatly impact the complexity of the query.

What I would suggest is perhaps using query-main first (this is the one I most often used when writing Wikidata:SPARQL_query_service/WDQS_graph_split/Federated_Queries_Examples) and consider using query-scholarly if the query happens to be difficult to write. I hope that with more examples we can improve the guide over time.

VALUES is a sparql feature that allows to define a variable, I used it to avoid having to repeat wd:Q17508688 in the two clause around UNION. So that you can change it in single place when willing to see publication of another author.
BIND(?itemLabel AS ?itemLabel) is a trick we use to make the wikibase:label understand that we want to keep the label the of the item, this explained at Wikidata:SPARQL_query_service/WDQS_graph_split/Internal_Federation_Guide#Misplacing_the_label_service. But in general BIND is creating a variable, for instance in place of VALUES (?author) {(wd:Q17508688)} I could've written BIND(wd:Q17508688 as ?author).
UNION allows to collect the information from multiple expressions: { EXPRESSION1 } UNION { EXPRESSION2 }, in the query above EXPRESSION1 extract the scientific publications (?item) and their labels (?itemLabel) from the scholarly subgraph, EXPRESSION2 is collecting the other publications (blogs, articles) from the host service (here serving the wikidata_main graph).DCausse (WMF) (talk) 13:11, 5 September 2024 (UTC)

Thank you, In practice I seem to be struggling with the UNION command - I've tried it in multiple queries and always get an error message, whatever combination of curly brackets I try!

If I take this query from my thesis project https://w.wiki/5aHL which gives me a list of LSE’s doctoral theses with author links to Wikipedia pages where available, and try to re-write it for the new main graph... I edit it to include the hint optimizer, the SERVICE scholarly graph and BIND – the query runs, but gives me no results https://w.wiki/B7Fj

So I try to add in the UNION command, but whatever I do with curly bracket combinations I get an error message so can’t run the query

SELECT ?thesis ?thesisDescription ?thesisLabel ?author ?authorLabel ?authorwp ?lse_url WHERE {
&nbsp; hint:Query hint:optimizer "None" .
&nbsp; SERVICE wdsubgraph:scholarly_articles {
&nbsp; 
&nbsp; ?thesis wdt:P31/wdt:P279* wd:Q1266946 ;
&nbsp;&nbsp; wdt:P953 ?lse_url.
&nbsp; 
&nbsp; &nbsp; BIND(?thesisLabel AS ?thesisLabel)
&nbsp; &nbsp;&nbsp; SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
&nbsp; }
&nbsp; } UNION {
&nbsp;&nbsp; # Union them with the publications in the main graph (blogs, articles...)
&nbsp; &nbsp; ?thesis wdt:P31/wdt:P279* wd:Q1266946 ;
&nbsp;&nbsp; wdt:P953 ?lse_url.
&nbsp; } 
&nbsp; OPTIONAL {
&nbsp;&nbsp; ?thesis wdt:P50 ?author.
&nbsp;&nbsp; OPTIONAL {
&nbsp; &nbsp;&nbsp; ?authorwp schema:about ?author;
&nbsp; &nbsp; &nbsp; schema:isPartOf https://en.wikipedia.org/.
&nbsp;&nbsp; }
&nbsp; }
FILTER(STRSTARTS(STR(?lse_url), http://etheses.lse.ac.uk))
&nbsp; SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY (?thesisDescription)

Are you able to advise what I’m doing wrong on this one? HelsKRW (talk) 10:10, 6 September 2024 (UTC)

@HelsKRW Your query is syntactically incorrect because it does not balance the opening and closing curly brackets. With complicated queries like this I highly suggest to use proper wikipedia:Indentation_style to rapidly identify where the problem is.

Every time a curly bracket is opened you indent the next line with 2 spaces to the right, when closing one you remove 2 spaces. Open or close only one curly bracket per line. With your query you could perhaps have identified that the problem happened right before the UNION where you have an extra closing curly bracket.

Similarly when not repeating the subject in the patterns (when using ;) try to align the predicates like this:

?thesis wdt:P31/wdt:P279* wd:Q1266946 ;
        wdt:P953 ?lse_url .

So that it's clearer that the wdt:P953 applies to the ?thesis.

After there was several other things incorrect:

You need the thesis' descriptions which are extracted via the label service, in the federation query you need to instruct this service that you need them with BIND(?thesisDescription AS ?thesisDescription) in the same way you bind the ?thesisLabel or by selecting them in a SELECT
The pattern ?thesis wdt:P50 ?author. matches a triple owned by the publication and thus must also be part of the federated query on the scholarly_article (see Wikidata:SPARQL_query_service/WDQS_graph_split/Internal_Federation_Guide#What_is_where?)
You were select thesis using a property path wdt:P31/wdt:P279* which requires triples from the main graph, this is also explained in the section I linked above
And finally you are returning a variable bound under an OPTIONAL clause, these variables are annoying with federation, see Wikidata:SPARQL_query_service/WDQS_graph_split/Internal_Federation_Guide#Returning_variables_bound_by_OPTIONAL for how we workaround this difficulty.

Please see below your query rewritten with federation (to run on query-main) and some explanations in the comments:

SELECT
  ?thesis
  ?thesisDescription
  ?thesisLabel
  (COALESCE(IF(BOUND(?author), ?author, 'N/A')) AS ?author)
  ?authorLabel (COALESCE(IF(BOUND(?authorwp), ?authorwp, 'N/A')) AS ?authorwp)
  ?lse_url
WHERE {
  hint:Query hint:optimizer "None" .
  # Ideally we want to select thesis with: ?thesis wdt:P31/wdt:P279* wd:Q1266946
  # This property path might require navigating triples in the two subgraphs and thus we can't use it
  # We extract ?thesisType first so that we will match it with a simple pattern ?thesis wdt:P31 ?thesisType
  ?thesisType wdt:P279* wd:Q1266946 .
  {
    SERVICE wdsubgraph:scholarly_articles {
      SELECT ?thesis ?thesisLabel ?thesisDescription ?thesisType ?lse_url (COALESCE(IF(BOUND(?author), ?author, 'N/A')) AS ?author) { 
        ?thesis wdt:P31 ?thesisType ;
                wdt:P953 ?lse_url.
        FILTER(STRSTARTS(STR(?lse_url), "http://etheses.lse.ac.uk"))
        # We return a variable bound in an OPTIONAL clause, we have to be careful here 
        # see https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Internal_Federation_Guide#Returning_variables_bound_by_OPTIONAL
        OPTIONAL { ?thesis wdt:P50 ?author. }
        # No need to use the BIND(?thesisLabel AS ?thesisLabel)/BIND(?thesisDescription AS ?thesisDescription) trick here since we wrap our federated query
        # with a SELECT to workaround issues with the optionally bound ?author variable
        SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
      }    
    }    
  } UNION {
    # Union them with the publications in the main graph (blogs, articles...)
    ?thesis wdt:P31 ?thesisType ;
            wdt:P953 ?lse_url.
    FILTER(STRSTARTS(STR(?lse_url), "http://etheses.lse.ac.uk"))
    OPTIONAL { ?thesis wdt:P50 ?author. }
  }
  OPTIONAL {
    ?authorwp schema:about ?author;
              schema:isPartOf <https://en.wikipedia.org/> .
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY (?thesisDescription)

DCausse (WMF) (talk) 13:37, 6 September 2024 (UTC)

Thank you for this, and all the extra detail to help my learning, which I'm just working through. I've tried on a couple of days to save the query on the main graph, but get a message to say URL shortening failed...and I'm getting that with one other query on the main graph today, though have been able to get shortened URLs for plenty of other queries - is this the place to report that, or somewhere else? Thanks! HelsKRW (talk) 11:22, 12 September 2024 (UTC)

Unfortunately it is a known limitation that I face myself, I'm not sure how others workaround it but for my part I simply copy/paste the whole URL in wikitext. If I want to show the query in the page I sadly have to repeat it twice:

- once with the mw:Extension:SyntaxHighlight using lang="sparql"

- once by copy/paste the full URL in an external link like: [https://query-main.wikidata.org/#AWFULLY%20LONG%20AND%20UNREADABLE%20URL%20PARAMETERS Try it!]

<syntaxhighlight lang="sparql">
SELECT * {?s ?p ?o} LIMIT 1
</syntaxhighlight>
[https://query-main.wikidata.org/#SELECT%20%2a%20%7B%3Fs%20%3Fp%20%3Fo%7D%20LIMIT%201 Try It!]

Template:SPARQL does not yet support query-main nor query-scholarly but if it does at some point I suppose this might be quite handy. DCausse (WMF) (talk) 06:48, 13 September 2024 (UTC)

Thank you! HelsKRW (talk) 10:18, 13 September 2024 (UTC)

Slightly different results after federating a query

I noticed slightly different numbers in the results between my ordinary query and my rewritten for WDGS query. What's going on (probably I did something wrong!) The query is to count the types of things that main subjects of my theses are. The original query: Template:SPARQL2 The rewritten query: Template:SPARQL2 DrThneed (talk) 22:21, 11 September 2024 (UTC)

Oh - I realised it probably means there is some publication(s) in the thesis project that isn't in the scholarly subgraph for some reason and so its main subjects are the reason for the difference. We have a few things like reports, papers, etc, but I would have thought they all fell into the scholarly subgraph. How can I figure out which publication(s) that is? DrThneed (talk) 22:38, 11 September 2024 (UTC)

OK, never mind - reviewed the list of types of things in the project. I suspect there is a qualification or similar thing that falls within the project and has a main subject statement on, but isn't a publication. DrThneed (talk) 23:18, 11 September 2024 (UTC)

Query to find all Renaissance Artists born in Italy

Hi, I am totally new to Wikidata and SPARQL. I am studying but an example to start with would be awesome! Can I get all the names of Artists from the Renaissance movement that were born in Italy? Is that sufficnet information to create a query? Thank you! 93.151.230.93 20:13, 15 September 2024 (UTC)

Testwiki:Request a query/Archive/2024/09

All items that have a P1705 for Skolt Saami and a P4119 value

WDQS being weird (railway junctions)

Labels for scholarly articles

inferring narrower occupations

Islands

Help with WDGS

Slightly different results after federating a query

Query to find all Renaissance Artists born in Italy

List of cyclists and URLs to Wikipedia in different languages

humans without source ?

Slice, how does it work?

List of persons whose age is a multiple of 25

Filter by instance of country doesn't work for Bosnia and Herzegovina

Query that locates categories with most interwikis that still don't have a corresponding category on Hebwiki

issues where not all items are returning with query-scholarly

Olympic medalists

Top users of a property

Top Actors

Query with WP articles

first-level administrative division that share borders

All people without family names

Object parts recursively

Navigation menu

Search