The Domain Name System is one of the most important services in the Internet. The function of translating easy-to-remember, alpha-numeric names into IP addresses enables Internet users from around the world to seamlessly access content. While the basic function of the DNS is simple, its huge size, distributed ownership, and the diverse processes used in its configuration and management present significant challenges in understanding its characteristics and in making improvements to the system. Nonetheless, the importance of the DNS motivates the effort required to obtain that understanding.

In this project, we seek to build a broad and fundamental body of knowledge on the characteristics of DNS configurations and management practices that are used in its operation. Our approach is distinguished by being macroscopic: we seek to understand and characterize as much of the contents of the DNS as is practically available. The combination of this goal and the innovative analysis methods we propose will yield a first-of-its-kind set of research results and resources which we will share with the community.

Our research agenda consists of three components. In pursuing these components, we will make use of a fusion of multiple large-scale datasets that have recently become available but which have not been fully explored to date. A significant portion of our effort will be devoted to bias assessment, data assurance, and measurement-based verification of all data used in the project. The first project component is devoted to analyzing and statistically modeling the DNS contents reflected in these large datasets. A key characteristic of our approach is to view the DNS contents as a directed graph. Our work thus far shows that the graph- based view of the DNS is remarkably powerful as a tool for understanding the contents of the DNS and as a guide to the operational aspects of DNS. Modeling the DNS graph in terms of structural components and statistical properties forms a foundation for the other components of our project. The second component of this project seeks to better understand the operational aspects of DNS configuration and use, particularly by organizations that place entries into the DNS. We will build on the graph-based view of the DNS to characterize and quantify various activities performed by domain operators. We will consult with experts in the operations community for validation of our findings. The last component of this project seeks to build tools for the research community, supporting machine learning applications for the DNS, anomaly detection in the DNS, and a “DNS Workbench” enabling synthetic DNS workload generation for simulation and laboratory-based research on DNS configuration, management and performance.

Publications
This material is based upon work supported by the National Science Foundation under Grant Nos. 2319367-2319369. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.