Saturday, 26 January 2019

YouTube Videos

The complete list so far!

Thursday, 24 January 2019

EF CORE 2.1 BEST PRACTICES, TIPS & TRICKS


With the recent release of .NET Core 2.1 and Entity Framework Core 2.1 thereof, I thought I could share a few tips and best practices on how to use EF in the most efficient way and avoid some common pitfalls.
I’ve divided these tips into four sections: maintainabilityperformancetroubleshooting and testing. Without further ado, let’s get going!

Maintainability

Use eager loading

Before EF Core was released I was one of those developers who were used to the comfort of lazy loading. I knew it was kind of easy to accidentally trigger the lazy evaluation of some navigation properties but over time I learned to avoid those cases in the first place. Things worked quite smoothly especially on smaller projects and the code was pretty concise given that EF was responsible for retrieving the data for me whenever needed.
Then EF Core was released, and boom, it didn’t have lazy loading at all. It only had eager (and explicit) loading so I thought it was probably time to give it try. Now, after using it in many projects for about two years I can safely say that it makes a developer’s life better in many ways. Most imporantly eager loading forces you to think about data access patterns: what data you need to access and how. Some of the additional benefits of eager loading that I can think of:
  1. Nothing is included by default so you get no suprises
  2. It is much easier to understand how your code gets turned into SQL and executed
  3. Your code will perform better in most cases
  4. You can use true POCO entity classes without leaking in EF concepts in them (e.g. virtual navigation properties)

Group and re-use include statements

One of the downsides of eager loading is that your code can quickly become somewhat polluted with include statements.
// An imaginary e-commerce example using eager loading for fetching
// an order and it's Customer and LineItems navigation properties

var order = await _context.Orders
                .Include(x => x.LineItems)
                .Include(x => x.Customer).ThenInclude(x => x.BillingAddress)
                .Include(x => x.Customer).ThenInclude(x => x.ShippingAddress)
                .FirstOrDefaultAsync(x => x.Id == id);
That works just fine but when your application grows you’ll notice that those Include-statements are absolutely everywhere. You might also notice that you need to include the same properties (or a subset of those) in many different queries making it more difficult to manage them. To keep the performance optimal you shouldn’t load unnecessary properties but you also need to make sure you’ve loaded all the necessary ones for any given scenario.
What we can do to overcome this problem is to extract the Include-calls to separate extension methods.
public static class OrderQueryExtensions
{
    public static IQueryable<Order> IncludeLineItems(this IQueryable<Order> query)
    {   
        return query.Include(x => x.LineItems);
    }

    public static IQueryable<Order> IncludeCustomer(
        this IQueryable<Order> query,
        bool shippingAddress = false,
        bool billingAddress = false)
    {
        IQueryable<Order> customerQuery = query.Include(x => x.Customer);
        if (shippingAddress)
        {
            customerQuery = customerQuery
                .Include(x => x.Customer)
                .ThenInclude(x => x.ShippingAddress);
        }
        if (billingAddress)
        {
            customerQuery = customerQuery
                .Include(x => x.Customer)
                .ThenInclude(x => x.BillingAddress);
        }

        return customerQuery;
    }
}
Now you can use those extensions in queries:
var order = await _context.Orders
                .IncludeLineItems()
                .IncludeCustomer(shippingAddress: true, billingAddress: true)
                .FirstOrDefaultAsync(x => x.Id == id);
This will give you a more manageable approach that is also easier to read and understand for other developers.

Don’t initialize collections manually

One lesson that I’ve learned from practice is not to manually initialize any collections in your entities. The reason is, if you forget to include a navigation property then initializing it in the constructor will only cover that bug. Instead, write comprehensive tests that will catch these errors in the first place.
public class Order
{
    public Order()
    {
        // Do not do this since it's better to throw an exception than
        // just imagine an order doesn't have any line items in case we
        // forget to include those in the query
        LineItems = new List<LineItem>();
    }

    public ICollection<LineItem> LineItems { get; set; }
}

Performance

Always async-await

If you’re not doing it already, start using async and await to increase performance of your application. Especially on I/O intensive operations like heavier SQL queries this will allow your application to respond to other requests while waiting for the previous operations to complete.
// Instead of using a synchronous method, e.g.
_context.Orders.FirstOrDefault(x => x.Id == id);

// Use the asynchronous alternative
await _context.Orders.FirstOrDefaultAsync(x => x.Id == id);

Avoid client evaluation

Client evaluation must be one of the most dangerous features of EF Core since it’s such easy to ignore:
Any LINQ expressions you write to query your DB, EF Core will try to translate it into SQL and if it can’t the query will be evaluated in client (without you knowing, unless you look at the logs).
What makes that nice is that almost any query you write will work but, and this is a big but: it can kill the performance. If you query a large dataset any client evaluation will only happen after all the rows have been returned from DB. This has potential to cause severe performance and memory issues if you’re not careful.
// Here we want to only fetch orders created by John Doe but since
// the IsCreatedBy cannot be translated to SQL Entity Framework silently
// fetches all orders and performs the filtering in memory

var orders = _context.Orders
    .Where(order => order.IsCreatedBy("John Doe"))
    .ToList();
Whenever EF falls back to evaluating a query in client it will log a warning (in case you have logging enabled, that is). You can also disable client evaluation completely.

Use projections where appropriate

Projections are easily one of the most important concepts of keeping your queries and application performant: whenever you need to load a larger set of entities from database, e.g. to show a list of orders in an e-commerce website, do not query entities directly!
Instead use projections that allow you to only return those fields from the databse that you really need.
var orderHeaders = await _context.Orders
    .Select(o => new OrderHeader {
        Id = o.Id,
        CustomerName = o.Customer.Name,
        NumberOfLineItems = o.LineItems.Count(),
        PlacedOn = o.PlacedOn
    })
    .ToListAsync();
In the example above, only the data that we select will be returned from the database potentially increasing performance dramatically when compared to loading all the entities with all their fields and navigation properties. Note that you don’t need to include any navigation properties when using projections.
In general, I tend to use projections when I need to query data for views over multiple entities, like lists for example. When I’m only dealing with a single entity or need to modify data, I load the actual entity.

Optimize correlated sub-queries

Correlated sub-queries have a potential to kill query performance as they will result in N + 1 queries. EF Core 2.1 now allows you to optimize this kind of queries by utilizing buffering, resulting only in 2 queries.
Here’s an example of a correlated sub-query that results in N + 1 queries:
var valuableLineItems = _context.Orders
    .SelectMany(
        x => x.LineItems
                .Where(item => item.Amount > 100)
                .Select(item => item.Name)
    );
Running this query will first fetch all orders and then run the sub-query for each of them returning all items with amount greater than 100.
To allow buffering and squeeze this into just 2 queries (one for orders, one for line items) all you need to do is add an explicit .ToList() after the inner query:
var valuableLineItems = _context.Orders
    .SelectMany(
        x => x.LineItems
                .Where(item => item.Amount > 100)
                .Select(item => item.Name)
                .ToList()
    );

Don’t forget to declare indexes

This isn’t anything new or fancy. Just make sure that you explicitly add indexes for fields that you query by and that aren’t primary or foreign keys.
// In your database context class

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
    modelBuilder.Entity<Order>()
        .HasIndex(x => x.PlacedOn);
}

Troubleshooting

Turn on logging

EF Core logs a lot of important and helpful messages, like any warnings for client side evaluation and other issues. Make sure that you are logging those, for example using my favorite logging library Serilog.

Disable client evalution in development environment

I strongly suggest you to disable client evaluation in the development environment to make sure that you don’t unintentionally deploy such code to production.
// startup.cs

optionsBuilder
        .UseSqlServer("<connection string>")
        .ConfigureWarnings(warnings => warnings
            .Throw(RelationalEventId.QueryClientEvaluationWarning));

Testing

EF’s in-memory database isn’t a relational database

EF Core comes with an in-memory database that you can use for testing your services and API’s. It’s super useful but keep in mind that it isn’t a relational database and thus doesn’t force the constraints of one. This includes index and foreign key constraints so for example deleting an entity can succeed in the in-memory database whereas in an actual relational database it will fail.

Consider using SQLite In-memory

To overcome issues discussed previously with EF’s own in-memory database, you can decide to use SQLite in-memory instead. It’s probably not as fast or robust but it can help you catch database issues that would otherwise go unnoticed.