Cloud Wisdom Weekly: For tech companies and startups” This blog series is new and will answer the most common questions that tech and startup customers have about building apps faster, smarter and more affordable. Julianne Cuneo, Google Cloud Big Data & Analytics consultant, discusses how to use BigQuery effectively.
Large amounts of data, such as those found in traditional data warehouses or data lakes, can be complex, costly, time-consuming, and require specialized skills that may be difficult to source. These are the challenges that must be overcome in order to compete in today’s data-driven and customer-centric marketplaces.
This effort is critical for analyzing data at scale, as well as managing costs and resources. Many companies are turning to the cloud for solutions and to strike the right balance. This article will discuss how startups and growing tech companies can use BigQuery to innovate, and will also share tips and tricks that will allow you to make the most of Google’s enterprise cloud data warehouse.
Optimizing data management and analysis
Companies often rush to load data and run queries in order to see how new technology works. While this is a good way to get a quick proof of concept or evaluation, it won’t guarantee you long-term success. Instead, you should be more sophisticated about your business, security and budgetary strategies. These tips will help you build a solid, scalable foundation. There are also specific examples of BigQuery’s optimization of a data platform architecture.
1. Scale storage and compute independently
The biggest challenge when it comes to managing large amounts of data is having the right storage capacity. Even if you have the budget to store large amounts of data, it can be difficult to analyze and extract value from them. These challenges can be overcome by serverless architecture.
Serverless platforms like BigQuery allow you to pay for compute and storage separately. You can scale up or down according to your data requirements. This makes it more affordable to store large amounts of data than if you bundle services. You get more storage and compute, and you pay more.
A second advantage is that you can store more data and generate more insights. BigQuery’s scalable compute capability allows you to query petabytes, terabytes and even petabytes in one request.
These capabilities combined allow you to scale analytics efforts based on your needs and not a predefined amount storage or compute resources.
2. Organize storage and data carefully
Data management and analytics is not complete without ensuring that data access is available at the right time and in a secure manner. Planning for resource optimization is a great way to save time, avoid billing and work flow problems, as well as avoid security and billing issues. In BigQuery’s resource organization key design considerations are:
- Datasets and the objects they contain (e.g. views, tables, ML models, etc. Only one project can have datasets and their objects (e.g. views, tables, ML models, etc.) This is the project for which storage costs will be charged. This resource will help you decide if you want to use a centralized database warehouse, assign data marts for individual projects, or combine both.
- BigQuery allows for control of access to objects at the table, row, row and column levels. This should be considered when designing your storage system (e.g. grouping closely related objects within the same dataset to facilitate access grants).
3. Optimize compute performance and cost across teams and use-cases to optimize the results
For some use cases, it may be necessary to plan and control costs in order to meet strict service-level agreements (SLAs). BigQuery is an example of this. Data can be accessed from any location, and compute resources are billed to the project running the query. To track queries usage more precisely, you can create individual projects (e.g. finance, sales, or data science) to allow for different use cases and teams.
You should consider how you might want to manage compute resources across different projects. This is in addition to segmenting compute projects by use case or team for billing purposes. BigQuery allows you to switch between an on demand model and a flat rate billing model. This includes mixing and matching approaches to ensure that on-demand efficiency is balanced with flat-rate predictability. “Slot commitments” refer to dedicated compute resources that can further be divided into smaller allocations (or “reservations”). These allocations can be allocated to a single project or shared between multiple projects. This flexibility allows you to save money on the on-demand query model while still preserving compute power for highly-priority and compute-intensive workloads.
Let’s say that your company has committed 1,000 slots. You can allocate 500 slots to your data science projects that are computationally intensive, 300 to ETL and 200 to internal business intelligence which is more flexible. Your idle slots can’t be kept in isolation or left unutilized. These idle slots can be shared seamlessly with other ETL projects, so long as they aren’t being used up.
4. Optimize and load your data schemas
Once you have a clear understanding of how your data will be organized you can begin populating your data warehouse. BigQuery offers many ways to ingest data via flat files in Google Cloud Storage, prebuilt connectors to apps through Data Transfer Service, streaming inserts and compatibility to numerous third-party data migration and ETL tools.
You can achieve the best results by making a few minor adjustments to your table schemas. This means that partitioning or clustering is applied based on expected query patterns. This will significantly reduce the data being scanned by queries.
5. Unify your data investments
Data and analysis might require you to work with semi-structured and unstructured data in addition to your structured data. It is important to look beyond the “enterprise database warehouse” to consider solutions that create a centralized data lake.
BigQuery’s federation capabilities allow you to seamlessly query data stored in Google services such as Cloud Storage, Drive and Bigtable. BigQuery’s storage API gives you fast access to BigQuery storage in high volumes. These features can ensure that data efforts are unified and consistent across teams and platforms.
6. Have fun with queries
Now it’s time for you to query your data! Your platform should provide an easy way for people to get started immediately so that they don’t run into any problems.
BigQuery SQL is an ANSI-compliant solution that allows SQL developers to use their existing skills from the beginning. Many third-party tools also provide native connectors for BigQuery. These tools can either leverage BigQuery’s JDBC/ODBC drivers or author queries for the user. BigQuery’s Migrator Service is a tool that can automate the translation of SQL scripts from previous investments in data warehouses. These features make it possible to make data accessible, secure, and smart-budgeted. They also allow you to easily integrate the data into user-friendly interfaces that enable analysis.
BigQuery is a great choice if you are making the switch to BigQuery. BigQuery has many unique features that will allow you to do more than simply move existing queries and continue to work as before. You can run large analyses that you wouldn’t be able to do on another system. You can train a prototype machine-learning model using SQL-based BigQuery ML. You can query streaming data in real time. You can perform geospatial analysis using built-in GIS features. It is time to invent.
It takes planning and time to build a solid foundation of data.
These tips will help you position your company to succeed in the long- and short-term. They also save you the hassle of having to re-design your warehouse solution as your business grows. It is important to carefully evaluate the benefits of investing in any technology. We encourage you to try BigQuery by using quickstarts and visiting our Startups page and reaching to Google Cloud experts.