programming – Tech For Talk

January 18, 2026January 18, 2026

C++17: Efficiently Returning std::vector from Functions

The discussion centers on returning std::vector from C++ functions, highlighting Return Value Optimization (RVO) introduced in C++17. RVO allows the compiler to avoid copying vectors by constructing them in place when there's a single return path. For multiple return paths, std::move is used to transfer ownership efficiently. Exceptions exist, particularly with the conditional operator, which requires copying. Returning references from member functions is safer than from free functions since the object's lifetime ensures validity.

December 30, 2025December 31, 2025

std::span C++20: When to Use (and NOT Use) for Safe Buffer Passing

This post discusses the importance of efficient data handling in modern C++, exploring methods of passing contiguous data buffers to functions, including raw pointers, std::vector, std::array, and the C++20 feature std::span. It highlights the strengths and weaknesses of each method, emphasizing std::span's benefits for safe, flexible data management without ownership.

December 24, 2025December 24, 2025

Optimal C++ Containers for Performance Efficiency

Choosing an appropriate C++ container impacts memory layout, cache efficiency, and access patterns, vital for performance. Common comparisons include std::vector, std::deque, std::array, std::list, std::map, and std::unordered_map. The choice should align with data access and modification requirements, ensuring optimal performance for diverse workloads, from iteration to key-based access.

September 6, 2025November 21, 2025

Mastering RESTful API Basics for Developers

APIs are the backbone of modern applications. Among the different approaches to building them, one of the most widely used is the RESTful API. In…

Continue reading → Mastering RESTful API Basics for Developers

March 13, 2025December 28, 2025

Understanding Vector Multiplication in C: MPI Implementation

The blog discusses a method for multiplying a large square matrix by a vector using MPI with a block-column distribution strategy. It describes how Process 0 distributes matrix columns to different processes, which calculate local products and then combine results using MPI Reduce scatter. An understanding of vector multiplication is emphasized, explaining how vectors are represented in C, including examples of single and square vectors. The process of matrix-vector multiplication is detailed with a C code snippet, demonstrating each multiplication step and the final result. The blog prepares readers for implementing parallel computations in MPI, enhancing efficiency.

March 10, 2025November 21, 2025

Optimizing MPI Communication with Ping-Pong Patterns

This content discusses the challenges of measuring message-passing performance in a distributed system, specifically using a ping-pong pattern with MPI. It highlights the limitations of the C clock() function for timing short exchanges, as it may return zero or inconsistent results when few iterations occur. To obtain reliable data, the post recommends a dynamic iteration scaling approach—starting with a small number of iterations and doubling it until a measurable time is recorded. This method ensures accurate measurements across varying hardware and system loads, ultimately providing a robust benchmark for MPI communication costs essential for optimization in high-performance computing.

March 9, 2025December 28, 2025

Efficient Shipping Time Calculation Using MPI Techniques

The post discusses an advanced problem in distributed computing using MPI (Message Passing Interface) for a large e-commerce operation. It focuses on collecting local minimum and maximum shipping times from various global warehouse hubs to calculate overall global shipping times. The program simulates generating these times using C's random number generator, ensuring the correct relationship between min and max. It applies MPI_Reduce() to aggregate results efficiently across nodes. The author encourages experimentation with different randomization methods and varying the number of MPI processes while providing a GitHub repository for further exploration of relevant MPI examples.

March 9, 2025November 21, 2025

Efficient Data Aggregation with MPI_Reduce in Distributed Systems

In distributed computing, MPI programming utilizes a root node to manage data distribution and result aggregation among multiple nodes. The MPI_Reduce() function plays a critical role in performing global computations efficiently, allowing nodes to send data and gather results via message passing. Each non-root node computes its contributions, while the root node consolidates them. The function requires parameters such as sendbuf, recvbuf, count, datatype, op, root, and comm to operate effectively. While MPI_Reduce() returns results only to the root, MPI_Allreduce() disseminates results across all nodes. This understanding of MPI_Reduce() lays the groundwork for complex computational challenges.

March 8, 2025December 28, 2025

Introduction to Multi-Threaded Programming: Key Concepts

This blog post discusses how multi-tasking enables efficient CPU time-sharing among programs, allowing them to seemingly run simultaneously on a single-core processor. The OS scheduler manages task switching, allowing programs like a music player and a word processor to share CPU time effectively. Context switching is a rapid process that gives the appearance of parallel execution. However, distinct processes have isolated memory spaces, complicating data sharing. Threads within a process, on the other hand, share address space, simplifying communication and resource management. This post also introduces the pthread library for creating threads in C, showcasing the practicality of multi-threading.

March 8, 2025November 21, 2025

Building a Multi-Threaded Task Queue with Pthreads

Question 4 You are tasked with developing a multi-threaded system using Pthreads to manage a dynamic task queue. The program should start with a user-specified…

Continue reading → Building a Multi-Threaded Task Queue with Pthreads

February 28, 2025November 21, 2025

Understanding Parallelism in Uni-Processor Systems

The content explains that a uni-processor system has only one CPU, which can execute only one piece of code at a time. This leads to pseudo parallelism, where multiple programs seem to run simultaneously by sharing CPU time. For illustration, two simple programs are presented: one continuously prints "Hello World" and the other prints "Hello Boss." In practice, they take turns using the CPU, facilitated by the operating system's scheduler. The blog emphasizes terminologies like process and infinite loop, providing insights into how parallelism works, even in environments with limited processing capabilities.

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: