Maximizing Performance with Materialized Views: A Case Study
What are Materialized Views?
Materialized views are pre-computed tables in databases like Oracle and PostgreSQL that store SELECT query results, improving query performance by allowing the database to retrieve data directly from the table. Materialized views can be refreshed periodically or on demand and can be configured for an incremental refresh. They're a useful tool for optimizing query performance, especially for complex or time-consuming SELECT queries.
Why are they faster than views?
A materialized view is a saved query plus its results stored in a table, whereas a standard view is just a saved query. Since the results are saved, querying materialized views is frequently quicker.
It's worth noting that materialized views are not always faster than views. In some cases, it may be more efficient to execute the view's query from scratch rather than retrieving the data from a temporary table. The relative performance of materialized views and views depends on various factors, such as the complexity of the view's query, the amount of data involved, and the hardware and software configuration of the database.
Query:
CREATE MATERIALIZED VIEW view_name
AS query
WITH DATA;
# WITH DATA parameter: If the query has to load the data (the query results)
at once during the object creation.
Pros of Materialized Views
- When the data is synchronized directly with the database, the time is only spent on the INSERT, UPDATE, and DELETE operations.
- Network load is reduced
- The speed becomes much higher
- Overall productiveness increases.
- Low-latency query response times
- High concurrency of queries.
- PostgreSQL materialization provides ACID guarantees
Limitations of Materialized Views
- They are only updated on demand.
- The whole materialized view must be updated; there is no way to only update a single stale row.
Views vs Materialized Views: When are both preferred?
Views are used when you are required fresh data. If your query is rather fast to execute, or your situation allows you to tolerate the slow performance hence brings you the most up-to-date results.
Materialized Views are used when you are required performance speed. When the query is heavy and slow, and time delays are unacceptable, materialized views are a perfect opportunity for this situation. It does not keep the data up-to-date all the time. You can’t insert data into it either, so have to refresh to get the updated data using the REFRESH command. While refreshing, it’s locked, you can’t run queries on it. To tackle this situation, you can CONCURRENTLY keyword to simultaneously work on it while it's updating, this is called view maintenance.
Use in PostgreSQL
A common usage in PostgreSQL is faster access to data brought across from a remote system through a foreign data wrapper.
Refresh Methods in PostgreSQL
There are 4 methods to refresh materialized views in PostgreSQL:
- Manual refresh
- Automatic refresh
- Incremental refresh
- On demand refresh
Method # 1: Manual Refresh — This is the default refresh method for materialized views. To refresh a materialized view manually, you can use the REFRESH MATERIALIZED VIEW statement. This will execute the SELECT query that defines the materialized view and store the results in the materialized view's data table.
Here's an example of the REFRESH MATERIALIZED VIEW statement:
REFRESH MATERIALIZED VIEW view_name;
How a manual refresh works in PostgreSQL:
- The REFRESH MATERIALIZED VIEW statement is executed.
- PostgreSQL locks the materialized view's data table and prevents queries from accessing it.
- The SELECT query that defines the materialized view is executed.
- The results of the SELECT query are stored in the materialized view's data table, replacing the existing data.
- The lock on the materialized view's data table is released, allowing queries to access the updated data.
So, while the data in the materialized view's data table is replaced during a manual refresh, the table itself is not truncated.
Method # 2: Automatic Refresh — You can configure a materialized view to refresh automatically using the REFRESH MATERIALIZED VIEW CONCURRENTLY statement. This will refresh the materialized view in the background while allowing queries to continue accessing the old data. Once the refresh is complete, the materialized view's data table will be replaced with the new data.
Here's an example of the REFRESH MATERIALIZED VIEW CONCURRENTLY statement:
REFRESH MATERIALIZED VIEW view_name CONCURRENTLY;
Method # 3: Incremental Refresh — This will refresh the materialized view by only querying the data that has changed since the last refresh. Incremental refresh can be more efficient than a full refresh, especially if the data in the underlying table is frequently updated.
PostgreSQL does not have built-in support for incremental refresh for materialized views, but there are few third party extension that can be used for that purpose. One of the most widely used is pg_repack extension which support incremental refresh functionality. Another extension is pg_incremental_view that allows you to define a materialized view with a trigger and a rules to do incremental updates.
Here's an example of the REFRESH MATERIALIZED VIEW INCREMENTAL statement:
REFRESH MATERIALIZED VIEW view_name INCREMENTAL;
Method # 4: On demand Refresh — You can configure a materialized view to refresh on demand using the REFRESH MATERIALIZED VIEW ON DEMAND statement. This will refresh the materialized view whenever it is queried. This can be useful if you want to ensure that the materialized view always contains the most up-to-date data, but it can also have a negative impact on query performance.
Here's an example of the REFRESH MATERIALIZED VIEW ON DEMAND statement:
REFRESH MATERIALIZED VIEW view_name ON DEMAND;
Materialized Views Strategies:
After learning about refresh methods in PostgreSQL, we saw that the data doesn't remain updated all the time so there are some strategies built to tackle the performance of materialized views. Two new methods were thus found:
Eager Materialized Views: This modifies the MATERIALIZED VIEW right after the source data is modified. It also uses triggers, but instead of using an intermediate table to store row identifiers it executes an UPDATE, INSERT or DELETE statement directly on the MATERIALIZED VIEW's table.
Lazy Materialized Views: It works by storing identifiers of modified rows in a separate table and then, when it's time to refresh, reading that table and applying changes to the target one. They're called "lazy" since you have to explicitly "tell" the database when to refresh the target table's contents.
There's a great article by Jack Christensen about those strategies. Now to test their performance against views and materialized views, I ran the queries and created all triggers provided by Jack Christensen and it showed absolute performance difference.
The values are taken on an average of 5 query runs and amazing results were found. The materialized views offer the best performance only taking 223.2ms average but only if stale data is acceptable.
Coming to the eager materialized views, it offers the absolute best read performance only taking 476ms on average, but can only guarantee freshness if rows do not go stale due to the passage of time.
Now, while running queries for lazy materialized view, I found its impressive impact, offering almost as good read performance as eager materialized views but you must note that the first time the query runs it takes about 6176ms because it is caching the results. Subsequent runs only take about 135ms on average.
Automate refresh of materialized views
There are two ways to automate the execution of the REFRESH command:
- Create a schedule
- Set the database triggers
Refresh Methods in Oracle
There are 3 methods to refresh materialized views in Oracle:
- Complete Refresh
- Fast Refresh
- Partition Change Tracking (PCT) Refresh
Method # 1: Complete Refresh — A complete refresh may be requested at any time during the life of any materialized view. The refresh involves reading the detail tables to compute the results for the materialized view. This can be a very time-consuming process, especially if there are huge amounts of data to be read and processed.
Method # 2: Fast Refresh — Fast refresh of your materialized views is usually efficient because instead of having to recompute the entire materialized view, the changes are applied to the existing data. Thus, processing only the changes can result in a very fast refresh time.
Here's how fast refresh works in Oracle:
- When you create a materialized view, you can specify that it should use fast refresh by setting the REFRESH FAST option.
- Oracle maintains a record of the data changes made to the base tables of the materialized view in the materialized view logs.
- When you refresh the materialized view, Oracle compares the data in the materialized view logs with the data in the materialized view. If there are any differences, Oracle updates the materialized view with the new data from the materialized view logs.
- If there are no changes to the data in the materialized view logs, Oracle skips the refresh and the materialized view remains unchanged.
Method # 3: Partition Change Tracking (PCT) Refresh — When there have been some partition maintenance operations on the detail tables, this is the only method of fast refresh that can be used.
Here's how PCT Refresh works:
- When you create a materialized view, you can specify that it should use PCT Refresh by setting the REFRESH FAST option.
- Oracle maintains a record of the data changes made to the underlying table or tables of the materialized view. This record is known as the change tracking (CT) file.
- When you refresh the materialized view, Oracle compares the data in the CT file with the data in the materialized view. If there are any differences, Oracle updates the materialized view with the new data from the CT file.
- If there are no changes to the data in the CT file, Oracle skips the refresh and the materialized view remains unchanged.
In addition, PCT Refresh requires the underlying table to have row-level change tracking enabled.
You might like:
Thanks for reading!
To summarize, We have looked into the concept of materialized views. In the next blog, we will learn about the OLTP and OLAP differences.
If you enjoyed reading this blog post and want to show your support, please consider clicking on the link. Every click helps me to continue creating valuable content for my readers. Thank you for your support!
See you next time,
@TechAE
No comments:
Post a Comment
Thank you for submitting your comment! We appreciate your feedback and will review it as soon as possible. Please note that all comments are moderated and may take some time to appear on the site. We ask that you please keep your comments respectful and refrain from using offensive language or making personal attacks. Thank you for contributing to the conversation!