Curriculum

The MS in Data and Information Management curriculum encompasses database administration; data warehousing; data mining; and algorithms for efficiently analyzing, searching, retrieving, and transforming large data sets. Courses blend theory and practice to teach traditional data management methods as well as the latest technology and techniques for managing and processing big data.

In addition to the six courses shown below, students must complete six hours of capstone project work (ITEC 695). This program is currently designed for students to enter during the Fall semesters.

Entry to the program

DAIM is only accepting students for Fall semesters at this time. Transfer credit is possible, and a part-time program is conditionally available. For more information, please contact Dr. Jeff Pittges jpittges@radford.edu.

Course Descriptions

Data Engineering (ITEC 540)

Project-based learning experience to acquire programming skills to ingest, transform, and explore data. Students use Python Pandas, Spark, Data Bricks, Amazon Web Services, and Apache Airflow to gain hands-on experience building pipelines to collect, process, visualize, and store data; assess data quality; explore, clean, and prepare data for analysis, machine learning, and other data science applications. Emphasizes programming techniques to process big data in distributed environments.

Learn more about ITEC 540

  • Programming techniques for parallel and distributed computing 
    • MapReduce
    • Functional programming
    • Service-Priented software architecture 
  • Ingest
    • Processing CSV and JSON data formats
    • Connecting to APIs and database systems
    • Web scraping 
  • Transform
    • Data frames
    • Dates and times
    • Data profiling and cleaning
  • Explore 
    • Statistics: T-Tests, histograms and distributions
    • Data visualization

 

Students who complete the course will be able to:

  • Ingest data from files, APIs, database systems, and webpages.
  • Process data in CSV and JSON format.
  • Convert dates and times and analyze time series data.
  • Assess data quality.
  • Profile and clean data.
  • Explore and visualize data using histograms and other graphics.

 

Data Warehousing and Visualization (ITEC 542)

Advanced examination of principles of database systems studying techniques for modeling, managing, and analyzing large data sets. The course covers the architectural components to support enterprise-level business intelligence with in-depth coverage of the dimensional model, data integration, data visualization, performance dashboards, machine learning algorithms, and application of common data mining techniques. Students use Tableau to explore, analyze, and visualize a dataset and communicate their results in a paper, video, and presentation. Students must have completed a database course including hands-on experience with the relational model, SQL, security, database design, and stored procedures.

Learn more about ITEC 542

  • Introduction to business intelligence
  • Data Warehousing
    • Dimensional modeling
    • Warehouse aggregates
    • Data quality
    • Extract, transform, and load (ETL) process
    • Physical design   
    • Data warehousing lifecycle
  • Reporting and data analysis
    • Online analytical processing (OLAP)
    • Commercial query and reporting tools
  • Data mining
    • Data mining methodology
    • Statistical methods
    • Decision trees
    • Association rules
    • Clustering
    • Neural networks
    • Data preparation

Students who complete this course will be able to:

  • Design and develop a Star schema and describe best practices for dimensional modeling.
  • Design and develop a basic ETL process and explain the challenges of the ETL process.
  • Identify and develop valuable aggregates for a given problem.
  • Design and develop different types of reports and reporting requirements.
  • Describe the limitations of SQL with respect to analytical reports.
  • Describe common data mining tasks.
  • Describe data mining techniques and implement at least one technique.
  • Explain the value of transactional data with respect to business intelligence.
  • Explain the importance of data quality and the challenges of producing high quality data.

Cloud Data Engineering (543)

Hands-on introduction to designing and developing data pipelines on a cloud platform to ingest, process, store, and visualize data. In-depth study of data engineering services emphasizing purpose-built, cloud-native NoSQL database services and event-driven, serverless computing. This course prepares students for the AWS Data Engineering certification.

Learn more About ITEC 543

Topics include:

  • Cloud fundamentals
  • Serverless computing
  • Data ingestion and transformation
  • Purpose-built, cloud-native NoSQL database services

Students who successfully complete this class will have:

  • Explained cloud computing concepts and principles
  • Explained tradeoffs of key-value and document databases
  • Designed and developed a cloud-based, web application with a single key-value table storing multiple entities with an indexing schema to support predefined access paths
  • Designed, developed, and queried a graph database
  • Hands-on experience with common data engineering services
  • Prepared for industry certifications including cloud practitioner and data engineer

 

Distributed Systems Design (ITEC 641)

Investigates techniques for managing massive volumes of data and studies the design of scalable systems, on-demand computing, and cloud computing. Provides hands-on experience with Hadoop and NoSQL databases.

Learn more about ITEC 641


Topics include:

  • Introduction to distributed databases
    • Need for distributed databases
    • Types of distributed architectures
    • Challenges with distributed computing architectures
  • Comparison of various distributed DBMS architectures
  • Scalability and reliability of distributed architectures
  • Performance tuning on distributed databases.

Students who successfully complete this class will have:

  • Describe and apply general principles and concepts of distributed computing and distributed computing networks.
  • Design and implement distributed databases
  • Compare and contrast consolidated and distributed query processing and concurrency control.
  • Describe distributed database management reliability
  • Describe noSQL solutions to voluminous semi-structured data.
  • Describe and apply techniques to tune database management systems

 

Information Security and Assurance (ITEC 645)

Studies reliability, security, and privacy issues related to storing, transmitting, and processing large data sets. Studies techniques to secure databases and system infrastructure and methods to assure data integrity through fault tolerance and data recovery.

Learn more about ITEC 645

  1. Fundamentals of information security and privacy
    • Goals of security (confidentiality, integrity, availability, authentication, non-repudiation and
                 accountability)
    • Vulnerabilities and exploits on DBMS and data sets (e.g., Programming flaws, SQL injection,
                   statistical inference attacks)
    • Threat modeling and security analysis
  2. Information Security with data storage and management
    • Cryptography (symmetric key, asymmetric key, secure hashes and modes of operation)
    • Secure design principles (e.g., least privilege, complete mediation, separation of privilege, least common mechanism, defense in depths
    • Authentication
    • Access control
    • Access logs
    • Security mechanisms (e.g., perimeter security, host-based security)
    • Secure operations (backups, hardening distributed databases, disaster recovery, business continuity)
  3. Privacy
    • Statistical inference attacks and controls
    • Legal issues (e.g. HIPAA, FERPA, ECPA)
  4. Reliability
    • Failures
    • Fault tolerance

Students who complete this course will be able to:

  1. Enumerate the main goals of security and privacy including confidentiality, integrity, availability, authentication, non-repudiation, and accountability.
  2. Analyze and develop threat models for the security of database management systems,
    networks and distributed database infrastructures.  
  3. Analyze and develop threat models on the privacy of data (such as inference attacks).
  4. Perform security analysis on centralized and distributed database installations using techniques such as the Open Source Security Testing Methodology (OSSTMM).
  5. Describe and apply cryptographic algorithms, and mechanisms including secure hashes, secret key and public key cryptography, and their modes of operation to secure both stored data and data in transit across networks.
  6. Describe and apply standard secure design principles including least privilege, complete mediation, least common mechanism, economy of mechanism, defense in depth, reluctance to trust, and privacy to the different database installations.
  7. Describe and deploy authentication, fine-grained access control and accountability mechanisms (such as access logs) on database management systems and distributed and centralized database installations.
  8. Describe and deploy mechanisms that provide security such as intrusion detection systems and privacy such as those that protect against statistical inference attacks on databases.
  9. Perform secure operations including backup, recovery and secure updates.
  10. Administer security by enumerating the steps of risk management and developing security policies and plans such as acceptable usage policies, and business continuity and disaster recovery plan.
  11. Enumerate and identify privacy issues of data taking into account the federal and state laws that govern privacy such as HIPAA, FERPA, and the Electronic Communication and Privacy Act.
  12. Describe reliability mechanisms to achieve fault tolerance in distributed databases.

 

Enterprise Information Architecture (ITEC 647)

Investigates comprehensive, enterprise-wide approaches to organize, protect, and control trusted information assets. Students gain hands-on experience with the Alation data catalog while studying techniques to govern, control, and protect data including master data management, data quality, and data integration. Studies best practices including DataOps, data mesh, and observability.

Learn more about ITEC 647

  1. Information architecture
  2. Information governance
  3. Master data management
  4. Information quality
  5. Data integration
  6. Metadata management

Students who complete this course will be able to:

  1. Explain the importance of data governance.
  2. Develop policies for protecting and securing data and information assets.
  3. Design and develop a system to protect and secure data and information.
  4. Develop a program that integrates data from multiple sources.
  5. Profile data elements.
  6. Analyze the quality of an individual data element.
  7. Analyze the overall quality of a data source.
  8. Design and develop a system to capture and manage metadata.

 

Distributed Algorithms (ITEC 660)

Explores data structures and algorithms for storing and processing traditional data and big data. Provides hands-on experience with Spark and Scala.

Learn more about ITEC 660

  1. Analysis of algorithms a. Time and space b. Amortized analysis c. I/O bottlenecks
  2. Memory hierarchy a. Caching b. External memory organization (disk organization)
  3. Sorting and searching algorithms (counting sort)
  4. External memory and cache-oblivious data structures and algorithms (e.g., types of B-trees)
  5. Hashing
  6. Algorithms that exploit temporal and spatial locality
  7. Succinct data structures (rank, tries, suffix arrays) to store data compactly.
  8. Advanced topics, such as
    • Data compression
    • Pattern matching
    • Search engine indexing
    • NP completeness

Students who complete this course will be able to:

  1. Compare and contrast temporal and spatial efficiency of distributed algorithms and data structures used to store, query and process medium to large datasets.
  2. Describe and analyze the performance issues of the different memory organizations used to store large data sets.
  3. Describe and apply data structures and distributed algorithms that achieve efficiencies in query and processing times of medium to large datasets.
  4. Describe and apply data structures and algorithms that store data compactly.
  5. Describe current distributed algorithms and data structures used to store, query and analyze medium to large datasets.

 

Machine Learning and AI (ITEC 685)

Studies techniques for analyzing structured, unstructured, and semi-structured data at rest and in motion. Studies non-traditional data sources including social media, mobile devices, and sensors; emerging analytical applications; real-time processing of data streams; and massively parallel processing technology. 

Learn more about ITEC 685

  1. Databases and their evolution
  2. Big data technology, no-SQL
  3. AI techniques
  4. Logic Rule, Uncertainty
  5. Bayes rule, Naïve Bayes, Bayesian Network
  6. Sentiment analysis
  7. Association rule mining
  8. Learning latent model, Machine learning
  9. Cluster, classification
  10. Linear and logistic regression
  11. Least square, optimization
  12. Non-linear model, Neural Network
  13. Dimensionality reduction
  14. Anomaly detection
  15. Recommend system
  16. Parallel computing, Map Reduce
  17. Analytics tools

Students who complete this course will be able to:

  1. Categorize data into groups based on attributes.
  2. Classify information based on existing data.
  3. Identify the relationship between elements of a decision.
  4. Understand optimization, maximizing certain outcomes while minimizing others.
  5. Develop decision logic or rules that will produce the desired action.
  6. Predict an event in the future effectively based on certain model
  7. Seek out subtle data patterns to answer questions about customer performance, such as fraud detection models.
  8. Understand requirements in using big data analytics.
  9. Simulate human behavior or reaction to given stimuli or scenarios.

 

Practicum in Data Engineering: Capstone Project (ITEC 695)

Students consult with their advisor and mentors to design a project that demonstrates their skills and experience and prepare for their chosen career path. Each student works with at least one industry mentor. Students design, develop, and test a data pipeline. Projects focus on aspects most relevant to each student (e.g., cloud computing, data engineering, data analysis, predictive modeling, and security). Students propose and defend their project to the Capstone Project Committee, and students present their results in a public forum.

Learn more about ITEC 695

Each section of ITEC 695 will have one instructor. Each student will have a project advisor. Each student will design, implement, and test a substantial component of an information system. Multiple students may work together to develop an information system involving multiple components.