XQuery is a powerful, functional query language specifically designed for querying and manipulating data stored in XML (Extensible Markup Language) format. Its core characteristic is its ability to work directly on hierarchical and semi-structured data, unlike SQL, which is optimized for flat, relational tables. XQuery enables you to:
Years > 7
and Status = 'Gold'
).XQuery is a declarative language, meaning you specify what data you want, and the XQuery processor determines how to retrieve and process it. This makes it highly suitable for working with large, complex, and often heterogeneous structured datasets found in domains like finance, healthcare, digital publishing, and government.
hw8.xml
and hw8.xq
The XML file, hw8.xml
, presumably contains client data for a financial startup. Each client record includes elements such as Name
, City
, Account_Total
, Years
(of being a client), and Status
.
The XQuery file, hw8.xq
, is designed to filter these clients based on specific criteria:
Account_Total
is greater than $100,000.The query constructs a new XML element,
, for each client meeting these conditions.
Query Code (hw8.xq
):
xquery version "1.0";
(: This query selects clients from 'hw8.xml' based on tenure and account balance. :)
for $client in doc("hw8.xml")//Client
where xs:integer($client/Years) >= 7
and xs:decimal($client/Account_Total) > 100000
order by xs:decimal($client/Account_Total) descending
return
{
$client/Name,
$client/City,
$client/Account_Total,
$client/Years,
$client/Status
}
Note: I've added an order by
clause to the original query to sort the results by Account_Total
in descending order, which is often useful for such reports.
hw8.xq
)Assuming hw8.xml
contains appropriate data, the output would look like this (sorted by Account_Total
):
Sophia Garcia New York 198000 10 Platinum Emily Anderson San Francisco 150000 9 Gold
(The original OCR'd output was a summary; this shows the expected XML structure.)
Scenario: Financial institutions often deal with XML for transaction logs, client portfolios, and regulatory reporting (e.g., FpML, FIXML).
Example 1: List high-value clients (from clients.xml
) who have been with the firm for over 5 years, and whose account total exceeds $200,000, ordered by name.
for $client in doc("clients.xml")//Client
let $account_total := xs:decimal($client/Account_Total)
let $years_with_firm := xs:integer($client/Years)
where $years_with_firm > 5 and $account_total > 200000
order by $client/Name/LastName, $client/Name/FirstName
return
{$client/Name}
{$account_total}
{$years_with_firm}
Example 2: Calculate the total sum of Account_Total
for all 'Platinum' status clients.
let $platinum_clients := doc("clients.xml")//Client[Status = 'Platinum']
return
{sum($platinum_clients/Account_Total)}
{count($platinum_clients)}
Result (Example 1 - Sample):
Sophia Garcia 250000 10
Scenario: Healthcare uses XML extensively for patient records (HL7 CDA), medical imaging metadata, and research data.
Example 1: Retrieve names and encounter dates for male patients over 60 diagnosed with 'Hypertension' from patients.xml
.
for $patient in doc("patients.xml")//Patient
where $patient/Demographics/Gender = "Male"
and xs:integer($patient/Demographics/Age) > 60
and $patient/Encounters/Encounter/Diagnosis/Code = "I10" (: ICD-10 for Hypertension :)
return
{$patient/Name}
{
for $encounter in $patient/Encounters/Encounter[Diagnosis/Code = "I10"]
return {$encounter/Date}
}
Example 2: Count the number of patients for each unique primary diagnosis listed in the patients.xml
.
for $diag_code in distinct-values(doc("patients.xml")//Patient/Encounters/Encounter/Diagnosis[@primary='true']/Code)
let $patients_with_diag := doc("patients.xml")//Patient[Encounters/Encounter/Diagnosis[@primary='true']/Code = $diag_code]
return
{count($patients_with_diag)}
{doc("diagnoses_codes.xml")//Diagnosis[Code=$diag_code]/Description/text()} (: Assuming a separate lookup XML :)
Result (Example 1 - Sample):
John Doe 2023-05-10 2024-01-15
Scenario: Academic publishers, libraries, and researchers use XML formats like JATS (Journal Article Tag Suite), TEI (Text Encoding Initiative), and PubMed XML for articles, books, and metadata.
Example 1: Return titles and abstracts of articles published since 2020 containing the keyword "Artificial Intelligence" in their
or
sections from articles.xml
.
for $article in doc("articles.xml")//Article
where (contains(lower-case($article/Keywords), "artificial intelligence")
or contains(lower-case($article/Abstract), "artificial intelligence"))
and xs:integer($article/PublicationDate/Year) >= 2020
order by $article/PublicationDate/Year descending
return
{$article/Title/text()}
{substring($article/Abstract/text(), 1, 200)}... (: Truncated abstract :)
{$article/PublicationDate/Year/text()}
Example 2: List all unique journal titles and the count of articles published in each from articles.xml
.
for $journal_title in distinct-values(doc("articles.xml")//Article/Journal/Title)
let $articles_in_journal := doc("articles.xml")//Article[Journal/Title = $journal_title]
order by $journal_title
return
{$journal_title}
{count($articles_in_journal)}
Result (Example 1 - Sample):
The Rise of AI in Modern Healthcare This paper explores the transformative impact of artificial intelligence on diagnostic processes and patient care... 2023
In our MuniBuddy project, we interface with real-time transit data, often provided in SIRI (Service Interface for Real-Time Information) XML format from sources like 511.org. This XML contains deeply nested structures, such as
and
, with crucial data fields like
,
, and
.
While our backend primarily uses Python for parsing and processing this XML (often converting it to Python dictionaries/objects), the filtering logic mirrors what XQuery is designed for.
Python equivalent logic for filtering outbound arrivals:
# Python Example (conceptual)
siri_data = parse_xml_to_dict(siri_xml_feed) # Assume this function exists
outbound_arrivals = []
if "StopMonitoringDelivery" in siri_data:
for stop_visit in siri_data["StopMonitoringDelivery"].get("MonitoredStopVisit", []):
mvj = stop_visit.get("MonitoredVehicleJourney", {})
if mvj.get("DirectionRef") == "Outbound" and "ExpectedArrivalTime" in mvj:
outbound_arrivals.append(mvj["ExpectedArrivalTime"])
The equivalent XQuery to extract expected arrival times for outbound vehicles would be:
(: XQuery Example for SIRI data :)
for $visit in doc("siri_feed.xml")//MonitoredStopVisit
let $journey := $visit/MonitoredVehicleJourney
where $journey/DirectionRef = "Outbound"
and exists($journey/MonitoredCall/ExpectedArrivalTime)
return
{$journey/VehicleRef/text()}
{$journey/LineRef/text()}
{$journey/MonitoredCall/ExpectedArrivalTime/text()}
This comparison highlights how XQuery principles—traversing structured XML, applying conditional filters, and extracting relevant elements—are fundamental even when implemented in other languages. In MuniBuddy, this logic was crucial for filtering and normalizing transit data before serving it to our frontend map application.
XML is more than just a data format; it's a self-describing, schema-driven, and highly structured markup language designed for interoperability across diverse platforms and industries. Its trustworthiness, especially in security-critical sectors like finance and healthcare, stems from several key properties:
These characteristics make XML a preferred format in sectors where data precision, structural integrity, security, and long-term validation are paramount, often outweighing concerns about verbosity or the simplicity of alternative formats.
XQIB (XQuery in the Browser) was a lightweight JavaScript library that aimed to enable developers to write and execute XQuery 1.0 expressions directly within a web browser environment. The goal was to allow client-side XML filtering, transformation, and querying using XQuery syntax, potentially for dynamic updates of web page content based on local or fetched XML data.
However, XQIB did not achieve widespread adoption due to several factors:
This section demonstrates a real XQuery operation. The query written in hw8.xq
was executed using the BaseX XQuery engine on a file named hw2.xml
, located in the same directory.
The purpose of the query was to list books that have more than 300 pages. The following result was generated directly from that XML data:
Networking Infrastructure Adam Douglas & Janice Dall Network Security Jennifer Starter & Yuri Adantov XML & JSON Stuart Bell
The original XQuery (in hw8.xq
) used to produce this output:
xquery version "1.0";
for $b in doc("hw2.xml")/bookstore/book
where xs:decimal($b/price) < 30
return
<BookInfo>
<Title>{$b/title/text()}</Title>
<Author>{$b/author/text()}</Author>
</BookInfo>