Indexes

3FS (distributed filesystem, Distributed Filesystems

aborts (transactions), Transactions, Atomicity
- cascading, No dirty reads
- in two-phase commit, Two-Phase Commit (2PC)
- performance of optimistic concurrency control, Performance of serializable snapshot isolation
- retrying aborted transactions, Handling errors and aborts
abstraction, Layering of cloud services, Simplicity: Managing Complexity, Data Models and Query Languages, Transactions, Summary
accidental complexity, Simplicity: Managing Complexity
accountability, Responsibility and Accountability
accounting (financial data), Summary, Advantages of immutable events
Accumulo (database)
- wide-column data model, Data locality for reads and writes, Column Compression
ACID properties (transactions), The Meaning of ACID
- atomicity, Atomicity, Single-Object and Multi-Object Operations
- consistency, Consistency, Maintaining integrity in the face of software bugs
- durability, Making B-trees reliable, Durability
- isolation, Isolation, Single-Object and Multi-Object Operations
acknowledgements (messaging), Acknowledgments and redelivery
active/active replication (xem multi-leader replication)
active/passive replication (xem leader-based replication)
ActiveMQ (messaging), Message brokers, Message brokers compared to databases
- distributed transaction support, XA transactions
ActiveRecord (object-relational mapper), Object-relational mapping (ORM), Handling errors and aborts
activity (workflows) (xem workflow engines)
actor model, Distributed actor frameworks
- (xem cũng event-driven architecture)
- comparison to stream processing, Event-Driven Architectures and RPC
adaptive capacity, Skewed Workloads and Relieving Hot Spots
Advanced Message Queuing Protocol (xem AMQP)
aerospace systems, Byzantine Faults
Aerospike (database)
- strong consistency mode, Single-object writes
AGE (graph database), The Cypher Query Language
aggregation
- data cubes and materialized views, Materialized Views and Data Cubes
- in batch processes, Sorting Versus In-memory Aggregation
- in stream processes, Stream analytics
aggregation pipeline (MongoDB), Normalization, Denormalization, and Joins, Query languages for documents
Agile, Evolvability: Making Change Easy
- minimizing irreversibility, Batch Processing, Reprocessing data for application evolution
- moving faster with confidence, The end-to-end argument again
agreement, Single-value consensus, Atomic commitment as consensus
- (xem cũng consensus)
AI (artificial intelligence) (xem machine learning)
AI Act (European Union), Data Systems, Law, and Society
AirByte, Data Warehousing
Airflow (workflow scheduler), Durable Execution and Workflows, Batch Processing, Scheduling Workflows
- cloud data warehouse integration, Query languages
- use for ETL, Extract–Transform–Load (ETL)
Akamai
- response time study, Average, Median, and Percentiles
algorithms
- algorithm correctness, Defining the correctness of an algorithm
- B-trees, B-Trees-B-tree variants
- for distributed systems, System Model and Reality
- mergesort, Constructing and merging SSTables, Shuffling Data
- scheduling, Resource Allocation
- SSTables and LSM-trees, The SSTable file format-Compaction strategies
all-to-all replication topologies, Multi-leader replication topologies
AllegroGraph (database), Graph-Like Data Models
- SPARQL query language, The SPARQL query language
ALTER TABLE statement (SQL), Schema flexibility in the document model, Encoding and Evolution
Amazon
- Dynamo (xem Dynamo (database))
- response time study, Average, Median, and Percentiles
Amazon Web Services (AWS)
- Aurora (xem Aurora (cloud database))
- ClockBound (xem ClockBound (time sync))
- correctness testing, Formal Methods and Randomized Testing
- DynamoDB (xem DynamoDB (database))
- EBS (xem EBS (virtual block device))
- Kinesis (xem Kinesis (messaging))
- Neptune (xem Neptune (graph database))
- network reliability, Network Faults in Practice
- S3 (xem S3 (object storage))
amplification
- of bias, Bias and Discrimination
- of failures, Maintaining derived state
- of tail latency, Use of Response Time Metrics, Local Secondary Indexes
- write amplification, Write amplification
AMQP (Advanced Message Queuing Protocol), Message brokers compared to databases
- (xem cũng messaging systems)
- comparison to log-based messaging, Logs compared to traditional messaging, Replaying old messages
- message ordering, Acknowledgments and redelivery
analytical systems, Operational Versus Analytical Systems
- as derived data systems, Systems of Record and Derived Data
- ETL from operational systems, Data Warehousing
- governance, Beyond the data lake
analytics, Operational Versus Analytical Systems-Systems of Record and Derived Data
- comparison to transaction processing, Characterizing Transaction Processing and Analytics
- data normalization, Trade-offs of normalization
- data warehousing (xem data warehousing)
- predictive (xem predictive analytics)
- relation to batch processing, Analytics-Analytics
- schemas for, Stars and Snowflakes: Schemas for Analytics-Stars and Snowflakes: Schemas for Analytics
- snapshot isolation for queries, Snapshot Isolation and Repeatable Read
- stream analytics, Stream analytics
analytics engineering, Operational Versus Analytical Systems
anti-entropy, Catching up on missed writes
Antithesis (deterministic simulation testing), Deterministic simulation testing
Apache Accumulo (xem Accumulo)
Apache ActiveMQ (xem ActiveMQ)
Apache AGE (xem AGE)
Apache Arrow (xem Arrow (data format))
Apache Avro (xem Avro)
Apache Beam (xem Beam)
Apache BookKeeper (xem BookKeeper)
Apache Cassandra (xem Cassandra)
Apache Curator (xem Curator)
Apache DataFusion (xem DataFusion (query engine))
Apache Druid (xem Druid (database))
Apache Flink (xem Flink (processing framework))
Apache HBase (xem HBase)
Apache Iceberg (xem Iceberg (table format))
Apache Jena (xem Jena)
Apache Kafka (xem Kafka)
Apache Lucene (xem Lucene)
Apache Oozie (xem Oozie (workflow scheduler))
Apache ORC (xem ORC (data format))
Apache Parquet (xem Parquet (data format))
Apache Pig (query language), Query languages
Apache Pinot (xem Pinot (database))
Apache Pulsar (xem Pulsar)
Apache Qpid (xem Qpid)
Apache Samza (xem Samza)
Apache Solr (xem Solr)
Apache Spark (xem Spark) (xem Spark (processing framework))
Apache Storm (xem Storm)
Apache Superset (xem Superset (data visualization software))
Apache Thrift (xem Thrift)
Apache ZooKeeper (xem ZooKeeper)
Apama (stream analytics), Complex event processing
append-only files (xem logs)
Application Programming Interfaces (APIs), Data Models and Query Languages
- for change streams, API support for change streams
- for distributed transactions, XA transactions
- for services, Dataflow Through Services: REST and RPC-Data encoding and evolution for RPC
  - (xem cũng services)
  - evolvability, Data encoding and evolution for RPC
  - RESTful, Web services
application state (xem state)
approximate search (xem similarity search)
archival storage, data from databases, Archival storage
arcs (xem edges)
ArcticDB (database), DataFrames, Matrices, and Arrays
arithmetic mean, Average, Median, and Percentiles
arrays
- array databases, DataFrames, Matrices, and Arrays
- multidimensional, DataFrames, Matrices, and Arrays
Arrow (data format), Column-Oriented Storage, DataFrames
artificial intelligence (xem machine learning)
ASCII text, Protocol Buffers
ASN.1 (schema language), The Merits of Schemas
associative table, Many-to-One and Many-to-Many Relationships, Property Graphs
asynchronous networks, Unreliable Networks, Glossary
- comparison to synchronous networks, Synchronous Versus Asynchronous Networks
- system model, System Model and Reality
asynchronous replication, Synchronous Versus Asynchronous Replication, Glossary
- data loss on failover, Leader failure: Failover
- reads from asynchronous follower, Problems with Replication Lag
- with multiple leaders, Multi-Leader Replication
Asynchronous Transfer Mode (ATM), Can we not simply make network delays predictable?
atomic broadcast, Shared logs as consensus
atomic clocks, Clock readings with a confidence interval, Synchronized clocks for global snapshots
- (xem cũng clocks)
atomicity (concurrency), Glossary
- atomic increment, Single-object writes
- compare-and-set (CAS), Conditional writes (compare-and-set), What Makes a System Linearizable?
  - (xem cũng compare-and-set (CAS))
- denormalized data, Trade-offs of normalization
- fetch-and-add/increment, ID Generators and Logical Clocks, Consensus, Fetch-and-add as consensus
- write operations, Atomic write operations
atomicity (transactions), Atomicity, Single-Object and Multi-Object Operations, Glossary
- atomic commit
  - avoiding, Multi-shard request processing, Coordination-avoiding data systems
  - blocking and nonblocking, Three-phase commit
  - in stream processing, Exactly-once message processing, Exactly-once message processing revisited, Atomic commit revisited
  - maintaining derived data, Keeping Systems in Sync
- distributed transactions, Distributed Transactions-Exactly-once message processing revisited
- for multi-object transactions, Single-Object and Multi-Object Operations
- for single-object writes, Single-object writes
- relation to consensus, Atomic commitment as consensus
auditability, Trust, but Verify-Tools for auditable data systems
- designing for, Designing for auditability
- self-auditing systems, Don’t just blindly trust what they promise
- through immutability, Advantages of immutable events
- tools for auditable data systems, Tools for auditable data systems
Aurora (cloud database), Cloud-Native System Architecture
Aurora DSQL (database)
- snapshot isolation support, Snapshot Isolation and Repeatable Read
auto-scaling, Operations: Automatic or Manual Rebalancing
Automerge (CRDT library), Pros and cons of sync engines
availability, Reliability and Fault Tolerance
- (xem cũng fault tolerance)
- in CAP theorem, The CAP theorem
- in leader election, Subtleties of consensus
- in service level agreements (SLAs), Use of Response Time Metrics
availability zones, Tolerating hardware faults through redundancy, Reading Your Own Writes
Avro (data format), Avro-Dynamically generated schemas
- dynamically generated schemas, Dynamically generated schemas
- object container files, But what is the writer’s schema?, Archival storage
- reader determining writer’s schema, But what is the writer’s schema?
- schema evolution, The writer’s schema and the reader’s schema
- use in batch processing, MapReduce
awk (Unix tool), Simple Log Analysis, Simple Log Analysis, Distributed Job Orchestration
Axon Framework, Event Sourcing and CQRS
Azkaban (workflow scheduler), Batch Processing
Azure Blob Storage (object storage), Layering of cloud services, Setting Up New Followers
- conditional headers, Fencing off zombies and delayed requests
Azure managed disks, Separation of storage and compute
Azure SQL DB (database), Cloud-Native System Architecture
Azure Storage, Object Stores
Azure Synapse Analytics (database), Cloud-Native System Architecture
Azure Virtual Machines
- spot virtual machines, Handling Faults

Hadoop (data infrastructure)
- comparison to distributed databases, Batch Processing
- MapReduce (xem MapReduce)
- NodeManager, Distributed Job Orchestration
- YARN (xem YARN (job scheduler))
HANA (xem SAP HANA (database))
happens-before relation, The “happens-before” relation and concurrency
hard disks
- access patterns, Sequential versus random writes
- detecting corruption, The end-to-end argument, Don’t just blindly trust what they promise
- faults in, Hardware and Software Faults, Durability
- sequential vs. random writes, Sequential versus random writes
- sequential write throughput, Disk space usage
hardware faults, Hardware and Software Faults
hash function
- in Bloom filters, Bloom filters
hash join
- in stream processing, Stream-table join (stream enrichment)
hash sharding, Sharding by Hash of Key-Consistent hashing, Summary
- consistent hashing, Consistent hashing
- problems with hash mod N, Hash modulo number of nodes
- range queries, Sharding by hash range
- suitable hash functions, Sharding by Hash of Key
- with fixed number of shards, Fixed number of shards
hash tables, Log-Structured Storage
Hazelcast (in-memory data grid)
- FencedLock, Fencing off zombies and delayed requests
- Flake ID Generator, ID Generators and Logical Clocks
HBase (database)
- bug due to lack of fencing, Distributed Locks and Leases
- key-range sharding, Sharding by Key Range
- log-structured storage, Constructing and merging SSTables
- regions (sharding), Sharding
- request routing, Request Routing
- size-tiered compaction, Compaction strategies
- wide-column data model, Data locality for reads and writes, Column Compression
HDFS (Hadoop Distributed File System), Batch Processing, Distributed Filesystems
- (xem thêm distributed filesystems)
- checking data integrity, Don’t just blindly trust what they promise
- DataNode, Distributed Filesystems
- NameNode, Distributed Filesystems
- use in MapReduce, MapReduce
- workflow example, Scheduling Workflows
HdrHistogram (numerical library), Use of Response Time Metrics
head (Unix tool), Simple Log Analysis, Distributed Job Orchestration
head vertex (property graphs), Property Graphs
head-of-line blocking, Latency and Response Time
heap files (databases), Storing values within the index
- in multiversion concurrency control, Multi-version concurrency control (MVCC)
heat management, Skewed Workloads and Relieving Hot Spots
hedged requests, Single-Leader Versus Leaderless Replication Performance
heterogeneous distributed transactions, Distributed Transactions Across Different Systems, Problems with XA transactions
heuristic decisions (in 2PC), Recovering from coordinator failure
Hex (notebook), Machine Learning
hexagons
- for geospatial indexing, Multidimensional and Full-Text Indexes
Hibernate (object-relational mapper), Object-relational mapping (ORM)
hierarchical model, Relational Model versus Document Model
hierarchical navigable small world (vector index), Vector Embeddings
hierarchical queries (xem recursive common table expressions)
high availability (xem fault tolerance)
high-frequency trading, Clock Synchronization and Accuracy
high-performance computing (HPC), Cloud Computing Versus Supercomputing
hinted handoff (leaderless replication), Catching up on missed writes
histograms, Use of Response Time Metrics
Hive (data warehouse), Cloud Data Warehouses
- query optimizer, Query languages
HNSW (vector index), Vector Embeddings
hopping windows (stream processing), Types of windows
- (xem thêm windows)
Hoptimator (query engine), The meta-database of everything
Horizon scandal, Humans and Reliability
- lack of transactions, Transactions
horizontal scaling (xem scaling out)
- by sharding, Pros and Cons of Sharding
HornetQ (messaging), Message brokers, Message brokers compared to databases
- distributed transaction support, XA transactions
hot keys, Sharding of Key-Value Data
hot spots, Sharding of Key-Value Data
- due to celebrities, Skewed Workloads and Relieving Hot Spots
- for time-series data, Sharding by Key Range
- relieving, Skewed Workloads and Relieving Hot Spots
hot standbys (xem leader-based replication)
HTAP (xem hybrid transactional/analytic processing)
HTTP, use in APIs (xem services)
human errors, Humans and Reliability, Network Faults in Practice, Batch Processing
hybrid logical clocks, Hybrid logical clocks
hybrid transactional/analytic processing, Data Warehousing, Data Storage for Analytics
hydrating IDs (join), Denormalization in the social networking case study
hypergraph, Property Graphs
HyperLogLog (algorithm), Stream analytics

WAL (write-ahead log), Making B-trees reliable
WAL-G (backup tool), Setting Up New Followers
WarpStream (messaging), Disk space usage
web services (xem services)
webhooks, Direct messaging from producers to consumers
webMethods (messaging), Message brokers
WebSocket (protocol), Pushing state changes to clients
wide-column data model, Data locality for reads and writes
- versus column-oriented storage, Column Compression
windows (stream processing), Stream analytics, Reasoning About Time-Types of windows
- infinite windows for changelogs, Maintaining materialized views, Stream-table join (stream enrichment)
- knowing when all events have arrived, Handling straggler events
- stream joins within a window, Stream-stream join (window join)
- types of windows, Types of windows
WITH RECURSIVE syntax (SQL), Graph Queries in SQL
Word2Vec (language model), Vector Embeddings
workflow engines, Durable Execution and Workflows
- Airflow (xem Airflow (workflow scheduler))
- batch processing, Scheduling Workflows
- Camunda (xem Camunda (workflow engine))
- Dagster (xem Dagster (workflow scheduler))
- durable execution, Durable Execution and Workflows
- ETL (xem ETL (extract-transform-load))
- executor, Durable Execution and Workflows
- orchestrators, Durable Execution and Workflows, Batch Processing
- Orkes (xem Orkes (workflow engine))
- Prefect (xem Prefect (workflow scheduler))
- reliance on determinism, Deterministic simulation testing
- Restate (xem Restate (workflow engine))
- Temporal (xem Temporal (workflow engine))
working set, Sorting Versus In-memory Aggregation
write amplification, Write amplification
write path (derived data), Observing Derived State
write skew (transaction isolation), Write Skew and Phantoms-Materializing conflicts
- characterizing, Write Skew and Phantoms-Phantoms causing write skew, Decisions based on an outdated premise
- examples of, Write Skew and Phantoms, More examples of write skew
- materializing conflicts, Materializing conflicts
- occurrence in practice, Maintaining integrity in the face of software bugs
- phantoms, Phantoms causing write skew
- preventing
  - in snapshot isolation, Decisions based on an outdated premise-Detecting writes that affect prior reads
  - in two-phase locking, Predicate locks-Index-range locks
  - options for, Characterizing write skew
write-ahead log (WAL), Making B-trees reliable, Write-ahead log (WAL) shipping
- in durable execution, Durable execution
writes (database)
- atomic write operations, Atomic write operations
- detecting writes affecting prior reads, Detecting writes that affect prior reads
- preventing dirty writes with read committed, No dirty writes
WS-* framework, The problems with remote procedure calls (RPCs)
WS-AtomicTransaction (2PC), Two-Phase Commit (2PC)

Glossary Thông Tin Xuất Bản

Indexes

Ký hiệu

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

R

S

T

U

V

W

X

Y

Z