When clients need data, they need it now. Delivering reliable data quickly requires the best technology currently available. CoreLogic draws our data from hundreds of different data sources, each using its own set of rules, complexities and database schema. To download and standardize the data for our customers, we used to spend considerable effort analyzing and writing custom ETL processes for each database. That all changed when we began adopting the latest innovations in data processing, particularly focused around Kafka, Spring Cloud Data Flow and Elasticsearch. This modern technology allows digital developers to deliver standardized data faster than ever before.
Next-Gen Technology
Kafka is a stream processing platform that helps us standardize the enormous volume of data we get from numerous database backends and aggregates it all into one homogenous system that is fast, efficient, and most importantly, fault tolerant. Because Kafka allows for horizontal scaling, we can be confident about our ability to move forward with any level of load. Kafka considerably bolsters our ability to deliver standardized data quickly, but it’s not the only tool that we’re using to improve the process. Kafka processes the data through its topics and places it in Elasticsearch.
Elasticsearch is a NoSQL datastore that allows for extremely fast reading and writing on a horizontally scaling platform. If we need more capacity, we can just spin up and manage Elastic clusters in Amazon Web Services or Google Cloud Platform with minimal effort.
But Elasticsearch is more than just a data store. The Elastic stack ecosystem includes the data visualization platform Kibana, which allows us to not only store and present data very quickly, it but also to run machine-learning algorithms that analyze the data and report anomalies. Rather than performing manual QA to remediate problems in the data, we can set watchers and run reports to look for problems proactively, which vastly streamlines the QA process.
The end result of these innovations is faster, more reliable data getting into the hands of end users when they need it.
Harnessing New Capabilities
One example of how modern technology is helping us develop new ways to serve end users is the development of Trestle, an MLS data distribution platform that provides a single solution for all 130+ Matrix systems, with their nearly 1 million users. Each of these users relies on their data being offered to various technology providers in the fastest, most accurate way possible.
Cleaning, aggregating, standardizing and redistributing MLS data from hundreds of servers, Trestle has the capacity to handle hundreds of requests per second. This requires an enormous amount of power to do quickly, and Trestle handles the load by utilizing Pivotal Cloud Foundry to abstract our cloud provider technologies, which are currently Amazon Web Services and Google Cloud Platform. Similar to Kafka and Elasticsearch, our Pivotal infrastructure gives us the reliability of fault-tolerant horizontal scaling.
Because of our backend technology, we can meet RESO standards much more nimbly and make real-time updates in minutes rather than days. This speed of turnaround is particularly useful for MLS data, where end users expect to have accurate results within five minutes of a change in the source system.
Trestle is just one example of how new innovations in technology is revolutionizing what can be done with data. Keep your eye out for further innovations in the coming years.
Written by Al McElmon, Senior Leader, Software Engineering