Data Dumps - Freebase API Google Developers

04:10 | Author: David Perry
Data Dumps - Freebase API Google Developers

Freebase Data Dumps are provided free of charge for any purpose with regular updates by Google. They are distributed, like Freebase itself, under the Creative Commons Attribution (aka CC-BY) and use is subject to the Terms of Service. The Freebase/Wikidata ID mappings are provided under CC0 and can be used without restrictions.

If you're writing your own code to parse the RDF dumps its often more efficient to read directly from GZip file rather than extracting the data first and then processing the uncompressed data.

Topic descriptions often contain newlines. In order to make each triple fit on one line, we have escaped newlines with " "

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 3.0 License, and code samples are licensed under the Apache 2.0 License. For details, see our Site Policies.

Note: In Freebase, objects have MIDs that look like /m/012rkqx. In RDF those MIDs become m.012rkqx. Likewise, Freebase schema like /common/topic are written as common.topic.

Data Dumps are a downloadable version of the data in Freebase. They constitute a snapshot of the data stored in Freebase and the Schema that structures it, and are provided under the same CC-BY license. The Freebase/Wikidata mappings are provided under the CC0 license.

If you'd like to cite these data dumps in a publication, you may use: Or as BibTeX:

The subject is the ID of a Freebase object. It can be a Freebase MID (ex. m.012rkqx) for topics and CVTs or a human-readable ID (ex. common.topic ) for schema.

We also provide a dump of triples that have been deleted from Freebase over time. This is a one-time dump through March 2013. In the future, we might consider providing periodic updates of recently deleted triples, but at the moment we have no specific timeframe for doing so, and are only providing this one-time dump. The dump is distributed as a. tar.gz file (2.1Gb compressed, 7.7Gb uncompressed). It contains 63,036,271 deleted triples in 20 files (there is no particular meaning to the individual files, it is just easier to manipulate several smaller files than one huge file). Thanks to Chun How Tan and John Giannandrea for making this data release possible. Total triples: 63 million Updated: June 9, 2013 Data Format: CSV License: CC-BY 2 GB gzip 8 GB uncompressed Download The data format is essentially CSV with one important caveat. The object field may contain any characters, including commas (as well as any other reasonable delimiters you could think of). However, all the other fields are guaranteed not to contain commas, so the data can still be parsed unambiguously. The columns in the dataset are defined as:

The predicate is always a human-readable ID for a Freebase property or a property from a standard RDF vocabulary like RDFS. Freebase foreign key namespaces are also used as predicates to make it easier to look up keys by namespace.

The RDF data is serialized using the N-Triples format, encoded as UTF-8 text and compressed with Gzip.

The RDF data is serialized using the N-Triples format, encoded as UTF-8 text and compressed with Gzip.

The object field may contain a Freebase MID for an object or a human-readable ID for schema from Freebase or other RDF vocabularies. It may also include literal values like strings, booleans and numeric values.