- Home
- Packages Overview
- SDK Overview
Metadata SDK
Fetch and reconcile book metadata from multiple providers
The Metadata SDK helps you discover, normalize, and reconcile book metadata from multiple sources. Query 14+ different providers—including Open Library, WikiData, Library of Congress, and Google Books—through a unified API, then merge results into a single, high-confidence record.
Installation
npm install @colibri-hq/sdk
# or
pnpm add @colibri-hq/sdkQuick Start
Search for a book by ISBN and get metadata from multiple sources:
import {
MetadataAggregator,
OpenLibraryMetadataProvider,
WikiDataMetadataProvider,
} from "@colibri-hq/sdk/metadata";
const aggregator = new MetadataAggregator([
new OpenLibraryMetadataProvider(),
new WikiDataMetadataProvider(),
]);
const { results, consensus } = await aggregator.searchByISBN("978-0-14-028329-7");
console.log(results[0].title); // "The Great Gatsby"
console.log(results[0].authors); // ["F. Scott Fitzgerald"]
console.log(consensus?.confidence); // 0.92Core Concepts
Providers
A provider fetches metadata from a single source. Each provider implements the same interface, so you can mix and match them freely.
import { OpenLibraryMetadataProvider } from "@colibri-hq/sdk/metadata";
const provider = new OpenLibraryMetadataProvider();
const results = await provider.searchByISBN("978-0-14-028329-7");Aggregator
The aggregator queries multiple providers in parallel, deduplicates results, and calculates consensus confidence scores.
import { MetadataAggregator } from "@colibri-hq/sdk/metadata";
const aggregator = new MetadataAggregator(providers, {
timeout: 30_000, // Total timeout in milliseconds
deduplicateByIsbn: true, // Merge records with matching ISBNs
calculateConsensus: true, // Calculate agreement scores
});Records
Every search returns an array of MetadataRecord objects:
interface MetadataRecord {
id: string;
source: string;
confidence: number;
timestamp: Date;
title?: string;
authors?: string[];
isbn?: string[];
publisher?: string;
publicationDate?: Date;
language?: string;
pageCount?: number;
subjects?: string[];
description?: string;
series?: { name: string; volume?: number };
coverImage?: { url: string; width?: number; height?: number };
}Searching for Metadata
Search by ISBN
ISBN searches provide the highest confidence results. The SDK normalizes ISBNs automatically—you can pass ISBN-10 or ISBN-13, with or without hyphens.
// All of these work
await aggregator.searchByISBN("978-0-14-028329-7");
await aggregator.searchByISBN("9780140283297");
await aggregator.searchByISBN("0-14-028329-7");Search by Title
Title searches are useful when you don’t have an ISBN. Use exactMatch for precise matching or leave it off for fuzzy
search.
const results = await aggregator.searchByTitle({ title: "The Great Gatsby", exactMatch: false });Search by Author
Find all books by a specific author:
const results = await aggregator.searchByCreator({
name: "F. Scott Fitzgerald",
fuzzy: true, // Match name variations
});Multi-criteria Search
Combine multiple criteria for more targeted results. The aggregator uses the most specific criteria available.
const results = await aggregator.searchMultiCriteria({
title: "Gatsby",
authors: ["Fitzgerald"],
language: "en",
yearRange: [1920, 1930],
});Available Providers
Free Providers
These providers require no authentication:
| Provider | Best for |
|---|---|
OpenLibraryMetadataProvider | General book metadata, covers |
WikiDataMetadataProvider | Authority data, linked entities |
LibraryOfCongressMetadataProvider | Authoritative bibliographic data |
ISNIMetadataProvider | Author identification |
ViafMetadataProvider | Virtual authority records |
InternetArchiveMetadataProvider | Public domain, historical works |
CrossrefMetadataProvider | Academic works, DOI resolution |
DOABMetadataProvider | Open access academic books |
DNBMetadataProvider | German publications |
BNBMetadataProvider | British publications |
import {
OpenLibraryMetadataProvider,
WikiDataMetadataProvider,
LibraryOfCongressMetadataProvider,
} from "@colibri-hq/sdk/metadata";
const aggregator = new MetadataAggregator([
new OpenLibraryMetadataProvider(),
new WikiDataMetadataProvider(),
new LibraryOfCongressMetadataProvider(),
]);Authenticated Providers
These providers require API keys:
| Provider | Registration |
|---|---|
GoogleBooksMetadataProvider | Google Cloud Console |
SpringerNatureMetadataProvider | Springer Nature API Portal |
ISBNdbMetadataProvider | ISBNdb |
AmazonPaapiMetadataProvider | Amazon Associates |
import { GoogleBooksMetadataProvider } from "@colibri-hq/sdk/metadata";
const googleBooks = new GoogleBooksMetadataProvider({ apiKey: process.env.GOOGLE_BOOKS_API_KEY });Embedded Metadata
Extract metadata from ebook files (EPUB, MOBI, or PDF):
import { EmbeddedMetadataProvider } from "@colibri-hq/sdk/metadata";
const provider = new EmbeddedMetadataProvider();
const record = await provider.extractFromFile(file);
// Use the embedded data to search for additional metadata
if (record.isbn?.[0]) {
const enriched = await aggregator.searchByISBN(record.isbn[0]);
}Normalization
The SDK provides utilities to normalize metadata into consistent formats.
ISBN Normalization
Convert any ISBN format to a canonical form:
import { cleanIsbn, isValidIsbn, normalizeIsbn } from "@colibri-hq/sdk/metadata";
// Remove formatting
cleanIsbn("978-0-14-028329-7"); // "9780140283297"
// Validate and convert to ISBN-13
normalizeIsbn("0-14-028329-7"); // "9780140283297"
// Check validity
isValidIsbn("9780140283297"); // true
isValidIsbn("1234567890"); // falseAuthor Name Normalization
Standardize author name formats:
import { normalizeAuthorName, parseAuthorName, formatAuthorName } from "@colibri-hq/sdk/metadata";
// Convert "Last, First" to "First Last"
normalizeAuthorName("Fitzgerald, F. Scott"); // "F. Scott Fitzgerald"
// Parse into components
const parsed = parseAuthorName("F. Scott Fitzgerald");
// { first: 'F. Scott', last: 'Fitzgerald', middle: [] }
// Format as needed
formatAuthorName(parsed, "last-first"); // "Fitzgerald, F. Scott"Language Code Normalization
Convert language codes to ISO 639-1:
import { normalizeLanguageCode } from "@colibri-hq/sdk/metadata";
normalizeLanguageCode("eng"); // "en"
normalizeLanguageCode("English"); // "en"
normalizeLanguageCode("de"); // "de"Date Parsing
Parse publication dates from various formats:
import { parsePublicationDate, extractYear } from "@colibri-hq/sdk/metadata";
parsePublicationDate("2020-03-15"); // { year: 2020, month: 3, day: 15 }
parsePublicationDate("March 2020"); // { year: 2020, month: 3 }
parsePublicationDate("c. 1925"); // { year: 1925, approximate: true }
extractYear("Published in 2020"); // 2020Reconciliation
When multiple providers return data for the same book, the SDK reconciles differences to produce a unified record.
Automatic Reconciliation
The aggregator handles reconciliation automatically when you enable deduplication:
const aggregator = new MetadataAggregator(providers, {
deduplicateByIsbn: true,
calculateConsensus: true,
});
const { results, consensus } = await aggregator.searchByISBN(isbn);
// Results are deduplicated and merged
// Consensus contains agreement metrics
console.log(consensus?.confidence); // 0.92
console.log(consensus?.agreementScore); // 0.88Confidence Scoring
The SDK calculates confidence based on multiple factors:
| Factor | Description |
|---|---|
| Source count | More sources agreeing increases confidence |
| Source quality | Higher-priority providers carry more weight |
| Field agreement | Fields matching across sources boost confidence |
| Data completeness | More complete records score higher |
Confidence Tiers:
| Tier | Score | Interpretation |
|---|---|---|
| Exceptional | 0.95+ | Very high agreement, authoritative sources |
| Strong | 0.90–0.95 | Good consensus across multiple sources |
| Good | 0.80–0.90 | Moderate agreement |
| Moderate | 0.65–0.80 | Some disagreement between sources |
| Weak | 0.50–0.65 | Limited agreement |
| Poor | < 0.50 | Significant conflicts |
Conflict Detection
Detect and analyze conflicts between metadata sources:
import { ConflictDetector, ConflictDisplayFormatter } from "@colibri-hq/sdk/metadata";
const detector = new ConflictDetector();
const conflicts = detector.detectConflicts(preview);
console.log(`Total conflicts: ${conflicts.summary.totalConflicts}`);
console.log(`Critical: ${conflicts.summary.criticalCount}`);
// Format conflicts for display
const formatter = new ConflictDisplayFormatter();
const formatted = formatter.formatConflicts(conflicts.conflicts);
for (const conflict of formatted) {
console.log(`Field: ${conflict.fieldName}`);
console.log(`Severity: ${conflict.severity}`);
for (const value of conflict.values) {
console.log(` ${value.source}: ${value.displayValue}`);
}
}Error Handling
Graceful Degradation
The aggregator continues if some providers fail. Check the errors map to see which providers encountered issues:
const { results, errors } = await aggregator.searchByISBN(isbn);
if (errors.size > 0) {
for (const [provider, error] of errors) {
console.warn(`${provider}: ${error.message}`);
}
}
// Results from successful providers are still available
console.log(`Got ${results.length} results`);Retry Behavior
Providers automatically retry on transient failures:
- Network timeouts
- Connection errors
- Server errors (5xx)
Retries use exponential backoff with jitter. Non-retryable errors (4xx client errors) fail immediately.
Rate Limiting
All providers respect rate limits automatically. The SDK queues requests and waits when necessary:
- Open Library: 100 requests/minute
- WikiData: 60 requests/minute
- Amazon PAAPI: 1 request/second
Performance
Caching
Enable caching to avoid redundant API calls:
import { MetadataCache } from "@colibri-hq/sdk/metadata";
const cache = new MetadataCache({
maxSize: 1000,
ttl: 3600000, // 1 hour
});
// Check cache before querying
const cached = await cache.get(cacheKey);
if (cached) {
return cached;
}
const results = await aggregator.searchByISBN(isbn);
await cache.set(cacheKey, results);Timeouts
Configure timeouts to prevent slow responses:
const aggregator = new MetadataAggregator(providers, {
timeout: 15000, // 15 seconds total
});Parallel Queries
The aggregator queries all providers in parallel by default. Use maxProviders to limit concurrency:
const selected = selectProviders(providers, query, "priority", {
maxProviders: 3, // Only query top 3 providers
});Advanced Features
Series Management
Find and manage book series:
import { findOrCreateSeries, addWorkToSeries } from "@colibri-hq/sdk";
// Find existing series or create a new one
const series = await findOrCreateSeries(database, "The Wheel of Time", {
language: "eng",
userId: "user-123",
});
// Add a work to the series with optional position
await addWorkToSeries(database, workId, series.id, 1); // Position 1Tag Management
Tags (subjects/genres) are normalized and deduplicated automatically:
import { findOrCreateTags, addTagsToWork } from "@colibri-hq/sdk";
// Batch create or find tags (optimized for multiple tags)
const tags = await findOrCreateTags(database, ["epic fantasy", "magic", "adventure"], {
userId: "user-123",
});
// Batch link tags to a work
await addTagsToWork(
database,
workId,
tags.map(({ id }) => id),
);Duplicate Detection
Detect potential duplicates in your library:
import { DuplicateDetector, detectDuplicates } from "@colibri-hq/sdk/metadata";
const detector = new DuplicateDetector({ minSimilarityThreshold: 0.7 });
const duplicates = detector.detectDuplicates(proposedEntry, existingLibrary);
for (const match of duplicates) {
console.log(`Match: ${match.entry.title}`);
console.log(`Similarity: ${match.similarity}`);
console.log(`Type: ${match.matchType}`); // 'exact' | 'likely' | 'possible'
console.log(`Recommendation: ${match.recommendation}`); // 'skip' | 'merge' | 'review_manually'
}TypeScript Support
The SDK is written in TypeScript and exports all types:
import type {
MetadataRecord,
MetadataProvider,
TitleQuery,
CreatorQuery,
MultiCriteriaQuery,
AggregatedResult,
ConfidenceFactors,
} from "@colibri-hq/sdk/metadata";Example: Complete Enrichment Flow
Extract metadata from a file and enrich it with external sources:
import {
EmbeddedMetadataProvider,
MetadataAggregator,
OpenLibraryMetadataProvider,
WikiDataMetadataProvider,
} from "@colibri-hq/sdk/metadata";
async function enrichEbook(file: File) {
// Extract embedded metadata
const embedded = new EmbeddedMetadataProvider();
const fileRecord = await embedded.extractFromFile(file);
// Search external sources
const aggregator = new MetadataAggregator([
new OpenLibraryMetadataProvider(),
new WikiDataMetadataProvider(),
]);
let externalRecords = [];
if (fileRecord.isbn?.[0]) {
const { results } = await aggregator.searchByISBN(fileRecord.isbn[0]);
externalRecords = results;
} else if (fileRecord.title) {
const { results } = await aggregator.searchByTitle({ title: fileRecord.title });
externalRecords = results;
}
// Merge: prefer embedded for core fields, external for enrichment
return {
title: fileRecord.title,
authors: fileRecord.authors,
isbn: fileRecord.isbn,
pageCount: fileRecord.pageCount,
// Enrich with external data
subjects: externalRecords[0]?.subjects,
description: externalRecords[0]?.description,
series: externalRecords[0]?.series,
};
}Related Documentation
- Ingestion SDK - Ebook ingestion pipeline
- SDK Overview - Core SDK features
- Open Library Client - Open Library API client